Chinese Whispers: The Cross-Border Application of AIGC in Art Appreciation ()
1. Introduction
The swift advancement of Artificial Intelligence Generated Content (AIGC) technology is increasingly impacting diverse domains of artistic creation and appreciation, thereby transforming the interactive relationships between individuals and artworks. AIGC presents a new source of creative inspiration for artistic expression, offering exploratory avenues and interpretative frameworks to navigate the changes characteristic of the digital age (Xu, 2024). Through the application of algorithms and data analysis, Artificial Intelligence Generated Content (AIGC) not only simulates the stylistic compositions inherent in artistic creation but also produces a variety of interpretive possibilities (Hur, 2023). By transforming visual elements into textual descriptions or structured data, Artificial Intelligence Generated Content (AIGC) enhances viewers’ multidimensional comprehension of composition, color, and style within artworks. This technology not only broadens the horizons of artistic creation but also initiates a digital revolution in modes of appreciation. It fosters innovative interactive perspectives that provide viewers with a more enriched artistic experience and facilitate interdisciplinary development opportunities within the realm of art education (Mayo, 2024).
Portraiture, a prominent mode of expression within the canon of Western art history, serves to represent the uniqueness of individual subjects while simultaneously reflecting the social contexts and cultural significances of the time, conveying the artist’s emotional and affective responses toward the figures portrayed (Stoleriu, 2015). Traditional art appreciation generally relies on the viewer’s understanding of art, professional expertise, and perceptual abilities. In contrast, AIGC focuses on recognizing the external attributes of artworks, analyzing the emotional aspects of portraits through intuitive visual elements, and reconstructing the artwork based on prompts generated by artificial intelligence (Chang et al., 2023). The reinterpretative capabilities provided by AIGC technology allow viewers to achieve a multidimensional understanding of the emotional characteristics inherent in portraiture. Additionally, structured data serves as a significant resource in the appreciation process, enabling a departure from the subjective limitations that frequently accompany human interpretations of art. This, in turn, enhances the comprehension of the aesthetic value embedded within these works (Elgammal et al., 2017).
Structured data in AIGC is primarily designed to organize and present the multifaceted information of artworks in a systematic and coherent format, offering concrete references for art appreciation. Employing advanced image recognition and data extraction techniques, AIGC precisely analyzes the visual characteristics of artworks, including color schemes, compositional arrangements, and textural nuances, facilitating a deeper understanding of artistic styles and creative methodologies. Through natural language generation (NLG) technology, AIGC further transforms historical contexts, cultural connotations, and the artist’s intent into accessible and well-structured descriptive data, enhancing comprehension of an artwork’s cultural and emotional dimensions (Li, 2024). Moreover, structured data supports cross-comparative analyses of multiple artworks, delivering interpretative insights from diverse perspectives. It also enables style simulation and creative reproduction, expanding its utility in art education and creative practices by introducing innovative approaches to artistic analysis and interpretation.
This study aims to investigate the potential cross-disciplinary applications of AIGC in the domain of art appreciation, with a particular focus on portraiture as the primary research sample. By harnessing the technological advantages offered by AIGC, the research seeks to establish a model of art appreciation that is facilitated through digital assistance tools. This model integrates concepts from formalism and basic emotion theories, synthesizing methodologies from art criticism while drawing upon existing literature in art education and cognitive studies. The anticipated findings are expected to not only enhance the appreciation of portraiture but also to improve the understanding of emotional expression, thereby broadening the applications of AIGC beyond mere creative production. This research aspires to develop multidimensional and personalized learning pathways within art education, fostering modernization and innovative development through the diverse interpretative capabilities of AIGC. Additionally, it aims to explore the broader potential applications of AIGC across various fields in the future.
2. Materials and Methods
2.1. AI-Assisted Processes in Art Appreciation
The integration of AI into art appreciation introduces an innovative and systematic framework that enhances viewers’ intuitive understanding of artworks’ fundamental elements. By deconstructing visual compositions, AI-assisted systems provide supplemental guidance throughout the appreciation process, thereby enriching the viewer’s interpretative experience. This study integrates art appreciation theories with AI technologies, as illustrated in Table 1, which delineates the components associated with each stage and explicates how the interrelationships among these stages reflect both theoretical alignment and practical applicability. This methodology proposes a holistic and nuanced framework for advancing the understanding of art appreciation.
Table 1. Theoretical framework of art education in AI art appreciation.
Theory |
Phase 1 |
Phase 2 |
Phase 3 |
Phase 4 |
Model of Art Criticism (Feldman, 1970) |
Description |
Analysis |
Interpretation |
Judgement |
Questioning Strategies (Taunton, 1983) |
Cognitive Memory Question |
Convergent Question |
Divergent Question |
Evaluation Question |
Bloom’s Taxonomy of Learning Objectives (Anderson &
Krathwohl, 2001) |
Remember |
Understand |
Apply |
Analyze |
Evaluate |
Create |
Integrating AIGC into the Art Appreciation Process |
The Process of AIGC |
Recognize |
Describe |
Modify |
Dissect |
Reframe |
Integrate |
Milestones |
Initial Prompt |
Modified Prompt |
AIGC Image |
Aesthetic Experience |
The vocabulary employed within AIGC processes is derived entirely from the cognitive domain verb list of Bloom’s Taxonomy.
2.1.1. Recognize & Describe in Phase 1
Feldman’s (1970) Art Criticism Model includes a Description phase, which requires viewers to describe a work objectively and without bias, emphasizing precise visual observation and factual recording, devoid of subjective interpretation. This cognitive requirement aligns closely with Taunton’s (1983) concept of Cognitive Memory Question, which represents the lowest level of cognitive operation, aiming for learners to accurately recall known information, such as a work’s visual elements or technical characteristics (Subramaniam et al., 2016). Both approaches focus on recording basic facts, establishing a necessary foundation for further analysis and understanding.
Further comparisons reveal that Taunton’s Cognitive Memory Question is functionally equivalent to the Remember level in Bloom’s (2001) Taxonomy of Learning Objectives, as both involve the recall and reproduction of known knowledge. Once learners successfully remember and reproduce relevant information about an artwork, they transition to the Understand level in Bloom’s Taxonomy. Here, learners move beyond simple memorization and start interpreting and comprehending the described content to facilitate a deeper analysis of the artwork, understanding not only the elements within the piece but also how these elements interact to produce an artistic effect or convey emotional meaning (Carney, 1994). This shift represents the transition from basic memory to a higher level of understanding.
In the AIGC’s generative process, Bloom’s Taxonomy of Remember and Understand corresponds with the phase of Recognize and Describe in AI-assisted image generation. The AIGC system initially recognizes the input image, analogous to the human Remember level, during which the system captures the essential components of portrait. Subsequently, in light of this recognition, the AI generates a description, mirroring Bloom’s Understand cognitive level, where identified information is interpreted and transformed into coherent textual representations. Thus, the AIGC’s generative process mirrors the cognitive development model in human learning, from recognizing foundational data to performing semantic interpretation (Liang & Mokhtar, 2024).
The process of AI generation, the description of an artwork serves not only as an end result but also as the basis for subsequent generative AI prompts. The AI-generated description provides directional guidance for the next creative phase, analogous to the function of Initial Prompt in research. The Initial Prompt, functioning as a system input, defines the subsequent generative framework and content structure. Through the Describe phase, AI can extend its outputs to further applications. This phase showcases how foundational descriptions evolve into creative outputs, reflecting the progression from cognitive memory and comprehension phases to practical application.
2.1.2. Modify & Dissect in Phase 2
In Feldman’s Art Criticism Model, the Analysis phase emphasizes the dissection and evaluation of structural elements in artworks, such as color, line, and composition. The goal is to understand how these elements work together to create the piece’s overall effect. At this phase, viewers employ logic and reasoning to reveal an artwork’s external structure and principles, closely aligning with Taunton’s concept of Convergent Questions. Convergent Questions are generally characterized as closed-ended inquiries that require learners to integrate their existing knowledge to arrive at a singular correct answer. This attribute renders them especially effective for the analysis of the formal elements within a specific work (Liang & Mokhtar, 2024). Both approaches require learners to engage in convergent thinking within multiple streams of information to generate precise understanding and interpretation.
A further examination indicates that Taunton’s framework of Questioning Strategies and the Apply and Analyze levels within Bloom’s Taxonomy mutually reinforce one another. Convergent Questions encourage learners to synthesize data into specific conclusions, a process that aligns with the Apply level, where learners must use learned knowledge in concrete scenarios and reason based on prior knowledge. Simultaneously, this process involves an analysis of the formal aspects of the work, embodying Bloom’s Analyze cognitive level, where one dissects information to discern its external structure or logic. At this phase, learners are not only using knowledge but also engaging in critical thinking to break down the complexity of the work.
Bloom’s Taxonomy of Apply and Analyze levels correspond directly to the Modify and Dissect phases in the AIGC’s generative process. In AI prompt generation, when the system requires modifications to the initial output, the modification of the framework is guided by the results of specific analyses, which align with Bloom’s Taxonomy at the system utilizes previous inputs to adapt to new contexts and implement adjustments based on the feedback received. Concurrently, the AI must perform a deep dissection of the artwork, aligning with Bloom’s Analyze level. AI systems in image analysis examine the elements and relationships within an image, enhancing the generative process with greater precision, similar to how humans interpret artworks (Hanninen, 2004).
The Modify phase within AI generation involves refining and optimizing previously generated prompts, paralleling the Modified Prompt employed in this study phase. The Modified Prompt is adjusted based on feedback from prior outputs, aiming to enhance the accuracy and alignment of generative results. Following the revised generative framework, the AI reanalyzes the inputs, reflecting an iterative process of refinement and improvement. This progression, from initial identification to formal analysis, and iterative modifications during practical implementation, culminates in a higher-quality output.
2.1.3. Reframe in Phase 3
The Interpretation phase proposed by Feldman in the Art Criticism Model emphasizes the exploration of an artwork’s inherent meaning. This process requires viewers to not only rely on objective descriptions and analyses of the elements within the work but also to speculate on the intended themes and purposes behind it. Such interpretation is highly subjective, often shaped by the viewer’s personal cultural background and knowledge structure. This aligns with Taunton’s concept of Divergent Question, which encourages learners to explore multiple possible answers, opening up potential for various interpretations. Divergent Questions, such as “What emotions or messages might this work convey?” are designed to stimulate learners’ critical thinking, prompting them to interpret the intrinsic meanings of artworks from multiple perspectives (Carney, 1994), complementing the Interpretive phase in Feldman’s model.
From Taunton’s Divergent Question to the Evaluate level in Bloom’s Taxonomy, there presents a progression of cognitive logic. Divergent Questions are designed to stimulate open-ended thinking and encourage multifaceted interpretations among learners, while the Evaluate level of Bloom’s Taxonomy further requires learners to exercise judgment and appraisal of proposed perspectives or works, grounded in the diversity of insights obtained. The Evaluate level encompasses not only subjective judgments about the aesthetic value of the artwork but also the learner’s capacity to critically balance various perspectives. When faced with diverse interpretations, learners must utilize divergent thinking and apply aesthetic standards or specific critical frameworks to evaluate the reasonableness of each interpretation. This evaluation process exemplifies the higher-order cognitive skills emphasized by Bloom (Mason, 1982).
The Evaluate level in Bloom’s Taxonomy also has correlations with the Reframe process in AI generation. In AIGC’s image generation, the system often interprets and reorganizes input data to create new works. This process parallels the selective judgment found on Evaluate level, as AI must analyze input image elements and reconstruct the artwork based on specified conditions (such as style or composition). Although AIGC’s process is algorithmically driven, it involves a selection process similar to Evaluate cognitive level, where the system chooses the optimal combination to achieve a specific goal. The process of AI reconstruction of the artworks can be viewed as a reinterpretation and creation based on pre-existing elements, reflecting how evaluation in high-order thinking can lead to innovative outcomes as described in Bloom’s Taxonomy.
The Reframe phase in AI image generation directly corresponds to the AIGC Image results in this study. By analyzing input prompts and generating images based on set parameters, the AI demonstrates continuity from image analysis to reconstruction. The results of this research phase consist of AI-generated images that demonstrate AI’s capabilities in visual cognition and artistic creation. By reconstructing artworks through AI, this study not only explores the potential applications of digital technology in art appreciation but also provides a novel perspective on the integration of human judgment with machine computation within the domains of art education and creative processes.
2.1.4. Integrate in Phase 4
In Feldman’s Model of Art Criticism, Judgment constitutes the final phase of the evaluative process, requiring viewers to engage in a subjective assessment of the artwork. This Judgment encompasses not only an evaluation of the technical and aesthetic merits of the work itself but also an appraisal of its value within the context of personal background and socio-cultural factors. During this phase, viewers employ aesthetic standards and critical thinking, aligning with Taunton’s concept of the Evaluation Question. Taunton’s approach calls on learners not only to analyze the artwork but to critically examine specific materials or thematic perspectives, encouraging questions such as, “What is the value of this piece within the context of art history?” Such questions prompt learners to apply evaluative standards for deeper reflection (Carney, 1994), paralleling Feldman’s notion of Judgment by emphasizing the viewer’s capacity for a final evaluative stance following a comprehensive analysis of the work.
Taunton’s Evaluation Question also corresponds to the Create level within Bloom’s Taxonomy. The Evaluation Question is designed to encourage learners to engage in critical reflection on existing information or perspectives, thereby establishing a foundation for the Create level. In Bloom’s Taxonomy, the Create level represents the highest level of cognitive ability, requiring learners to synthesize existing knowledge and evaluations to generate new ideas, products, or solutions. After passing the Evaluate level in the previous phase, learners encounter the challenge of integrating evaluative insights with their creative capabilities to generate new works or perspectives. This process not only necessitates that learners engage both openly and critically with existing artworks, as well as possess the ability to integrate and re-create, thereby allowing them to reimagine artistic styles or design new forms of creative expression (White et al., 2000).
The Create level in Bloom’s Taxonomy of Learning Objectives aligns closely with the Integrate phase in AIGC’s generative processes, illustrating a clear correspondence between the two. During generation, AI systems must synthesize and analyze a variety of input elements and style prompts, creatively consolidating them into novel content. This process reflects Bloom’s Create level, as AI systems reorganize and integrate based on existing datasets and algorithmic principles to produce text and images. This type of generative technology extends beyond mere data computation, encompassing the integration of aesthetics, style, and technique, which is similar to the creative thinking of humans. At this stage, AI not only performs generative tasks but also participates in the art appreciation process, where prompts used to describe external forms and images that convey internal meanings must be integrated to achieve precise outputs, which is consistent with the cognitive level of Create defined by Bloom (Liang & Mokhtar, 2024).
The Integrate phase within the AI generative process consolidates the study’s ultimate findings in terms of Aesthetic Experience. Given that AI generative systems can synthesize a variety of artistic styles, creative techniques, and compositional elements to create new forms of art, and considering that AI’s approach to perceiving artworks aligns with established Models of Art Criticism, the textual or visual outputs from AIGC can be seen as records of AI’s cognitive engagement with artworks. The appreciation methodology employed in this study not only references established criteria of artistic judgment but also evaluates the AIGC system’s analytical and integrative capabilities from a technology-assisted perspective to construct an optimal process for application. The interdisciplinary integration of AIGC within art appreciation not only broadens the applications of the arts but also prompts a reevaluation of the relationship between digital technology and aesthetic creation. This approach aims to reveal alternative perspectives in art appreciation, achieving a more diversified Aesthetic Experience with potential extensions into art education.
2.2. Experimental Process
This study utilizes portraiture by Taiwan region’s renowned outsider artist Chiu Ya-Tsai as the experimental sample. Chiu is dedicated to exploring the physiological changes driven by psychological states, referring to his distinctive style as “Analytical-psychological Impressionism,” with a primary focus on the expression and emotion of his portrait subjects. The intense emotional portrayal within his works is rendered through specific compositional elements, with prominent emotional characteristics that facilitate AI recognition. Therefore, this study employs AI-based image generation techniques and AI-assisted tools to investigate the application of AI technology in art appreciation. The study unfolds across four experimental stages. For a detailed overview of the procedures, please refer to Figure 1.
To ensure the acquisition of professional insights into art evaluation, the study enlisted five experts with established credentials in the arts, comprising art scholars, professors in design disciplines, and practitioners in visual communication design. The selection of experts adhered to rigorous criteria, including: master’s or doctoral degrees in relevant fields, experience in art appreciation with a minimum of 10 years, and proficiency in the practical application of AIGC tools. The evaluation process employed standardized experimental samples and consistent assessment criteria, with consensus achieved through structured expert discussions. Beyond refining the initial sample set, the experts provided detailed recommendations for Prompt optimization to enhance the precision and comprehensiveness of the final evaluative outcomes, ensuring robust and objective results.
This study utilizes the portrait works of Chiu Ya-Tsai as research samples. To streamline the sample size, 70 emotion-themed portraits were initially screened. Through discussions among experts with professional backgrounds in art, and by referencing the expressive characteristics of the seven basic emotions, 10 portraits with pronounced emotional expressions were selected. These works, distinguished by their diverse facial expressions, postures, and emotional intensity, were deemed ideal subjects for research on AI-generated imagery. Subsequently, the experts conducted a formalist analysis of the visual composition of the portraits, identifying four essential components used to convey emotional characteristics: Facial Expression, Body Language, Outfit & Styling, and Scene Atmosphere. These components provided a structured framework for subsequent generative processes and served as critical criteria for evaluative analyses.
Figure 1. The research framework.
To ensure the accuracy of the study, two AI-generation systems, System A and System B, were selected for testing. The chosen AI systems were required to possess both Prompt and Image generation capabilities while maintaining operational generality and accessibility, allowing for routine use by non-experts. This design facilitates subsequent applications and broader dissemination. Upon confirming the systems, a sample analysis of 10 portraits was conducted, followed by the generation of prompts describing the core features of each work. Prompts A and Prompts B were generated independently by each system. Subsequently, these prompts were input into Systems A and B, respectively, to produce images, resulting in four distinct outcomes:
(1) Prompts A + Images A = P (A) + I (A)
(2) Prompts A + Images B = P (A) + I (B)
(3) Prompts B + Images A = P (B) + I (A)
(4) Prompts B + Images B = P (B) + I (B)
Following the execution of AI content generation, experts with art backgrounds review the generated content of the two systems separately, confirming the Completeness and Accuracy of the prompt, and also referring to the Similarity between the initially generated images and the original artworks, and summarizing the direction of modification of the prompt. Experts evaluated the four essential components of portraiture (Facial Expression, Body Language, Outfit & Styling, and Scene Atmosphere) to determine which AI system’s prompt-generation best reflected the original artwork’s characteristics. Based on the experts’ reviews, the research team established a framework grounded in the four essential components of portraiture to guide the refinement of prompts. These modified prompts were subsequently re-input into the AI systems to generate new images, which were then re-evaluated for their quality and fidelity to the original artistic features.
In the final stage, the research team conducted an in-depth discussion based on the Completeness and Accuracy of the prompts, as well as the Similarity of the generated images, to determine which AI system better aligns with the requirements of the art appreciation process and is more suitable for subsequent applications in art education. Furthermore, the study examined the relationships and representational performance of the four essential components of portraiture to explore the cognitive approach of AI technologies in interpreting emotional expressions in portraiture, thereby synthesizing insights for interdisciplinary applications and related developmental frameworks.
2.3. Research Samples
The expression of emotions in portraiture not only adds depth to the artistic significance of a work but also effectively enhances the emotional connection between the viewer and the piece. The conveyance of emotions breathes life into an artwork, allowing viewers to experience an emotional resonance and engage in deeper reflection (Yücel, 2024). Chiu Ya-Tsai’s portraiture exemplifies this concept, renowned for its intense and nuanced emotional expression, positioning him as a seminal figure in emotional portrayal within the art field. Through adept use of color, composition, and brushwork, Chiu’s portraits allow viewers to perceive the emotional depth embedded in the work, evoking personal responses and introspection. Therefore, utilizing his emotion-themed works as experimental samples not only facilitates an exploration of viewers’ emotional experiences through the lens of Taiwan’s regional cultural context but also leverages the emotional ambiance of the artworks to achieve the effect of emotional contagion.
The capacity of facial expressions in portraiture to convey emotion is central to understanding the relationship between artistic expression and viewer emotional interaction. Artists, by meticulously rendering facial expressions, embed emotional information within visual representations on the canvas, guiding viewers to interpret the emotional states of the subjects (Lee & Yoo, 2022). The theory of basic emotions introduced by Paul Ekman further elucidates the universality of emotional expression. Research indicates that seven fundamental human emotions (Happiness, Sadness, Fear, Anger, Surprise, Disgust, and Contempt) are universally recognizable across cultures through facial expressions (Ekman & Friesen, 1990). This highlights emotion as a foundational mechanism of human adaptation.
However, Ekman noted that genuine emotions might manifest briefly expressed as micro-expressions on the face, fleeting and challenging to discern. He emphasized that facial expressions do not always accurately reflect emotions, as individuals can consciously regulate their facial expressions (Ekman, 1992). Consequently, when interpreting the emotions conveyed in artworks, visual focus may extend beyond facial cues to incorporate auxiliary components: such as background, lighting, body language, and clothing, that can provide additional emotional context, thereby enabling a more holistic interpretation of the work’s emotional content. Building on this theoretical foundation, this study selects Chiu Ya Tsai’s portrait works as a sample, applying criteria of emotional intensity and richness to identify 10 highly expressive pieces for in-depth analysis. The study examines how emotional expression influences viewers’ aesthetic experience and emotional responses. Detailed examples of the selected works are presented in Figure 2.
![]()
Figure 2. Portrait artworks by Chiu Ya-Tsai.
2.4. Questionnaire Survey
This study aims to explore the perception and evaluation of AIGC outputs through the implementation of an anonymous questionnaire, primarily comprising single-choice questions and partially open-ended questions. The questionnaire is divided into three main sections, as shown in Table 2. The first section, Basic Personal Data, is designed to collect background information about the respondents to serve as the basis for subsequent analyses. The survey items include Gender (Female and Male), Age (categorized into six intervals: 20 - 29 years, 30 - 39 years, 40 - 49 years, 50 - 59 years, 60 - 69 years, and Over 70 years), and Professional Background (classified by years of work experience: 1 - 9 years, 10 - 19 years, 20 - 29 years, 30 - 39 years, 40 - 49 years, and Over 50 years).
Table 2. Structure of questionnaire.
Questionnaire Classification |
Classification Item |
Answer Options |
Part 1. Basic Personal Data |
Gender |
Female/Male |
Age |
20 - 29/30 - 39/40 - 49/
50 - 59/60 - 69/Over 70 |
Professional Background |
1 - 9/10 - 19/20 - 29/
30 - 39/40 - 49/Over 50 |
Part 2. Prompts Evaluation Purpose: Completeness, Accuracy. Sample size: 10 set of images. |
Facial Expression Body Language Outfit & Styling Scene Atmosphere Holistic Composition |
Likert 5 Point Scale |
Suggestions |
Open-Ended |
Part 3. Images Evaluation Purpose: Similarity. Sample size: 20 set of images. |
Facial Expression Body Language Outfit & Styling Scene Atmosphere Holistic Composition |
Likert 5 Point Scale |
The second section, Prompts Evaluation, aims to assess the respondents’ agreement with the textual descriptions accompanying the portraits. The evaluation focuses on the completeness and accuracy of the prompts associated with 10 selected portraits. Respondents are required to rate six evaluation dimensions for each description, with higher scores indicating greater agreement. These dimensions include Facial Expression, Body Language, Outfit & Styling, Scene Atmosphere, Holistic Composition, and an open-ended question soliciting Suggestions for improvement. The open-ended responses aim to collect detailed recommendations for enhancement.
The third section, Images Evaluation, is designed to assess participants’ intuitive perceptions of the similarity between AI-generated artworks and their corresponding original pieces. The dataset comprises 20 paired images, with 10 generated by System A and 10 by System B. Respondents are tasked with scoring similarity across five dimensions: Facial Expression, Body Language, Outfit & Styling, Scene Atmosphere, and Holistic Composition. Higher scores indicate greater perceived similarity. The questionnaire employs a structured evaluation framework to facilitate an in-depth analysis of AIGC’s effectiveness in artistic appreciation, providing critical insights for future research.
3. Results
3.1. Expert Analysis
In this study, the completeness and accuracy of artwork analysis are critical factors in selecting the AI analysis systems. The evaluation of completeness focuses on ascertaining whether the prompts adequately and comprehensively encompass all requisite details. Highly complete prompts not only capture every component of the portrait: such as Facial Expression, Body Language, Outfit & Styling, and Scene Atmosphere, but also enhance the visual subtlety and depth of the piece. A comprehensive presentation of details allows viewers to more clearly perceive the various expressions within the artwork, thereby deepening their understanding of its meaning. Conversely, prompts lacking completeness may lead to the omission of important details, negatively affecting the richness and coherence of the holistic composition.
The purpose of assessing accuracy is to ensure that the AI can correctly interpret the artwork, such that each component adheres to the specific detail requirements of the piece. Accurate prompts ensure that descriptions faithfully represent the creator’s intent and the visual characteristics of the work, thereby enhancing the authenticity and expressive intent of the artwork. A lack of accuracy in prompts may result in deviations in the AI’s interpretation of various components within the portrait. Given the high demands for completeness and accuracy in artwork analysis, the selection of an appropriate AI system is paramount.
Table 3. Completeness and accuracy of AI prompts (I).
No. |
Portrait |
Completeness |
Accuracy |
Facial
Expression |
Body
Language |
Outfit & Styling |
Scene
Atmosphere |
Facial
Expression |
Body
Language |
Outfit & Styling |
Scene
Atmosphere |
A |
B |
A |
B |
A |
B |
A |
B |
A |
B |
A |
B |
A |
B |
A |
B |
1 |
Dancer in the Moonlight |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
1 |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
2 |
Figure in Red Dress |
1 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
3 |
Melancholy |
1 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
4 |
Personality |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
5 |
Love and Hatred |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
6 |
Frosting |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
7 |
Staring |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
8 |
Noble |
1 |
0 |
0 |
0 |
1 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |
1 |
1 |
1 |
1 |
9 |
Young Lady |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
10 |
Lifetime Colon |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
1 |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
|
Total |
7 |
0 |
9 |
4 |
10 |
8 |
10 |
10 |
7 |
0 |
9 |
3 |
10 |
8 |
10 |
5 |
In the table, the value “1” indicates the presence of the specified characteristic for the given component, while the value “0” signifies its absence.
Thus, this study conducted preliminary tests, followed by a comparative analysis of different systems based on the initial assessment of similarity in AI-generated images, to identify the most suitable system for subsequent research. Table 3 presents these test results in detail, illustrating the relative performance of various systems in terms of completeness and accuracy, providing a reliable basis for the selection of systems in subsequent research. The assessment of completeness and accuracy directly influences the precise expression of the artistic intent being presented; therefore, this study performed a detailed expert analysis of these two elements, and the results are summarized as follows.
3.1.1. Preliminary Evaluation for Completeness of AI Prompts
In this study, we compared the performance of Prompt A and Prompt B in interpreting 10 portrait paintings, with a detailed examination of four essential components: Facial Expression, Body Language, Outfit & Styling, and Scene Atmosphere. Through the analysis of quantified results, we can clearly identify the differences between the two prompts across various components.
Regarding Facial Expression, Prompt A explicitly described facial expressions in 7 out of the 10 portraits, indicating a focus on expression analysis; however, 3 portraits lacked related descriptions, revealing minor omissions. In contrast, Prompt B completely lacked any description of facial expressions in all 10 artworks, demonstrating a significant deficiency in the emotional interpretative capability of portrait analysis. Body Language emerged as another component showcasing notable differences. Prompt A provided specific descriptions of body language in 9 portraits, suggesting an analytical capability to capture bodily movements and postures. Conversely, Prompt B only addressed body language in 4 portraits, indicating an inability to comprehensively capture this component and a relative shortcoming in its descriptive performance.
In terms of Outfit & Styling, Prompt A exhibited higher completeness, accurately describing clothing and accessories in all 10 portraits, thereby reflecting a keen sensitivity and comprehensiveness in the analysis of visual details. Prompt B, however, addressed this component in 8 portraits, leaving 2 portraits without mention of clothing and accessories, indicating certain deficiencies in visual detail capture. With respect to Scene Atmosphere, both systems demonstrated consistency in performance. Both Prompt A and Prompt B provided complete descriptions of this component across all 10 portraits, indicating that both prompts possess certain capabilities and completeness in context depiction and atmosphere creation.
Overall, the analysis of these 10 portraits clearly demonstrates that Prompt A outperforms Prompt B in completeness across the components of Facial Expression, Body Language, and Outfit & Styling. Particularly in the depiction of Facial Expressions and Body Language, prompt A exhibited a high degree of detail and comprehensiveness. Furthermore, both prompts performed well in the description of Scene Atmosphere, successfully capturing the ambiance created by the artworks.
Specifically, in the selected 10 portraits, 6 works (No.1 “Dancer in the Moonlight,” No.2 “Figure in Red Dress,” No.3 “Melancholy,” No.4 “Personality,” No.5 “Love and Hatred,” and No.10 “Lifetime Colon”) received comprehensive descriptions across all four essential components, reflecting a high level of completeness in their analysis results. For the remaining 4 portraits, the painting No.8 “Noble” lacked only a description of Body Language, while the other 3 paintings omitted Facial Expression descriptions. Therefore, overall, Prompt A demonstrated relatively superior completeness in artwork analysis, particularly in addressing Outfit & Styling and Scene Atmosphere, providing more systematic and comprehensive support. Consequently, Prompt A is chosen as the preferred system for artwork analysis, proving more suitable than the alternative system.
3.1.2. Preliminary Evaluation for Accuracy of AI Prompts
In this study, we conducted a detailed accuracy analysis of Prompt A and Prompt B, evaluating four essential components: Facial Expression, Body Language, Outfit & Styling, and Scene Atmosphere, in order to compare their performance differences in portrait analysis.
In terms of Facial Expression, Prompt A accurately described the facial expressions in 7 out of the ten portraits. This indicates a high level of accuracy in capturing emotional expressions and subtle facial features, demonstrating consistency and reliability in its understanding and articulation of the expression components. For the description of Body Language, Prompt A provided accurate narratives for 9 portraits, showcasing its precision in analyzing bodily movements and posture details, reflecting a profound understanding of body language representation. In contrast, Prompt B addressed body language in 4 portraits, with 3 descriptions being correct and 1 containing an error. This reveals accuracy issues with Prompt B, which failed to consistently represent body language within the artworks.
Regarding Outfit & Styling, Prompt A accurately described the clothing and accessories in all 10 portraits, demonstrating a high level of precision and consistency in handling clothing and adornment details. Prompt B performed relatively well in this component, with correct descriptions of clothing and accessories in 8 of the portraits. In the analysis of Scene Atmosphere, Prompt A again exhibited superior accuracy, providing correct depictions of the scene atmosphere in all 10 artworks, thereby showcasing its profound understanding of the emotional and situational expression within the paintings. In contrast, Prompt B displayed weaker performance in this component, with only 5 portraits having correct descriptions of the scene atmosphere, while the other 5 contained errors. This indicates that Prompt B struggled with the accuracy of its analysis regarding atmosphere and environmental context, affecting its overall analytical effectiveness.
Based on the accuracy evaluation results of the 10 portraits, Prompt A achieved high accuracy across all four categorical components, with no erroneous analyses, indicating exceptional precision in interpreting the artworks and describing their components. Although Prompt B performed relatively well regarding Outfit & Styling with all correct descriptions, its accuracy in other components (particularly Scene Atmosphere and Body Language) was markedly insufficient, exhibiting multiple errors. Thus, in light of the accuracy comparison, Prompt A not only demonstrated high completeness in artwork analysis but also surpassed Prompt B in accuracy, making it a more suitable tool for systematic artwork analysis.
3.1.3. Similarity between AI-Generated Images and Original Images
Regarding the AI-generated images, this study evaluates the similarity between the generated results and the original images (Table 4). Through the overall analysis of 10 artworks, we compared the generated AI images with the originals. It was found that Prompt A described the Facial Expressions in several artworks, and the AI images generated according to these prompts exhibited emotional expressions in the facial features that were closer to the facial emotions of the originals. In contrast, Prompt B lacked any descriptions of Facial Expressions, resulting in the displayed facial emotions appearing more random in the artwork.
In addition to Facial Expressions, Prompt A also outperformed Prompt B in terms of integrity concerning Body Language and Outfit & Styling. For the Scene Atmosphere, both prompts demonstrated satisfactory completeness. However, a deeper analysis of the prompt content revealed that Prompt A consistently described accurate details, showcasing better accuracy. While Prompt B performed relatively well in the description of Outfit & Styling, its accuracy in other critical components, especially in presenting the Body Language and Scene Atmosphere, was still insufficient, with several incorrect descriptions. In the overall interpretation of Systems A and B, when the completeness of the prompts is suboptimal, System A tends to independently incorporate novel visual elements, whereas System B adopts a more conservative approach, rendering only simplified content. Across the outcomes, both generated images exhibit deviations from the original image.
The primary objective of AI image generation lies in accurately reproducing the content specified in the prompts, ensuring that the generated images faithfully reflect the detailed requirements of the prompts. The accuracy of prompts plays a pivotal role in determining the similarity between the generated images and the original reference. Comparative analysis of the AI-generated images reveals that the elements in Prompt A are precise and error-free, resulting in images that more closely resemble the original reference compared to those generated from Prompt B. Due to multiple inaccuracies in Prompt B, the corresponding generated images deviate from the original in certain details. These inaccuracies result in the inclusion of visual elements in the generated images that were not present in the original, leading to alternative interpretations.
3.2. Modified Prompt
In this study, we undertook a thorough examination and analysis of the Initial Prompt, revealing that the features extracted were both disparate and lacking a coherent focus. To enhance the professionalism and applicability of the art appreciation process, we refined the Initial Prompt to direct attention toward specific aspects of artistic representation. This refinement involved the application of a framework-based prompt design intended to guide the generative process, thereby transforming the prompts into Modified Prompt that facilitates the generation of specific, thematically coherent narrative content.
Table 4. Cross-comparison of AI-generated images.
Subsequently, we employed the Modified Prompt to generate AI images with a concentrated focus, integrating this output into the art appreciation framework to assist viewers in their comprehension of the original artworks. This methodology highlights the pivotal role of prompt content types in the AI generative process. By steering the generated content toward a more professionally oriented narrative direction, we effectively enhanced the precision of the appreciation experience, allowing viewers to engage more profoundly with the meanings and artistic values encapsulated within the original works through the lens of AI-generated images.
Table 5. System comparison of AI-generated images.
3.3. AI-Generated Images
This study employed Modified Prompt to generate artificial intelligence images aligned with the research focus, as illustrated in Table 5. Through a comparative analysis of the four essential components (Facial Expressions, Body Language, Outfit & Styling, and Scene Atmosphere), the results demonstrate that the images generated from the Modified Prompt closely resemble the original features more than those generated from Initial Prompt. The application of Modified Prompt also significantly reduced the inclusion of excessive additional elements, allowing the generated images to more accurately represent the visual and stylistic characteristics of the original artworks, thereby enhancing the overall similarity between the generated images and the originals.
3.4. Statistical Analysis
To further elucidate the efficacy of the Modified Prompt, this study designed and implemented a questionnaire survey targeting respondents with professional backgrounds in art or design. The survey aimed to collect their specific evaluations of the prompt. A total of 108 valid responses were collected, and statistical methods were applied to perform numerical analyses across various elements.
The questionnaire employed consistent measurement tools and time points throughout the study. To ensure the homogeneity of the measurement items, internal consistency reliability was assessed using Cronbach’s α, with the results presented in Table 6. The Cronbach’s α coefficient was calculated as 0.819. Notably, only the removal of the Scene Atmosphere item resulted in a slight increase in the coefficient to 0.826. Overall, eliminating any of the essential components failed to produce a significant improvement in the original internal consistency coefficient. Therefore, item deletion was deemed unnecessary, as the questionnaire demonstrates satisfactory reliability.
The correlation coefficients indicate that Holistic Composition demonstrates significant positive correlations with the four essential components of emotional composition in portraiture (p < 0.05), as shown in Table 7. Regardless of which essential components exhibit positive covariance with the Holistic Composition. Higher levels of similarity among individual essential components correspond to higher levels of similarity in the Holistic Composition. The correlation coefficients between Holistic Composition and the four essential components indicate varying degrees of association, ranked from highest to lowest as follows: Outfit & Styling (0.624), Body Language (0.622), Facial Expression (0.571), and Scene Atmosphere (0.540). Holistically, the correlation coefficients for these components exhibit similar magnitudes.
Table 6. Reliability analysis of the questionnaire.
Reliability Statistics |
Cronbach’s α |
Cronbach’s α Based on Standardized Items |
N of Items |
0.819 |
0.825 |
5 |
Item-Total Statistics |
Essential Components |
Scale Mean if Item Deleted |
Scale Variance if Item Deleted |
Corrected Item-Total Correlation |
Squared Multiple Correlation |
Cronbach’s α if Item Deleted |
Facial Expression |
14.1542 |
13.048 |
0.571 |
0.406 |
0.797 |
Body Language |
13.9167 |
12.930 |
0.627 |
0.460 |
0.779 |
Outfit & Styling |
14.0833 |
13.056 |
0.636 |
0.429 |
0.776 |
Scene Atmosphere |
14.2819 |
13.952 |
0.470 |
0.325 |
0.826 |
Holistic Composition |
14.2028 |
12.757 |
0.787 |
0.620 |
0.738 |
Table 7. Pearson correlation of essential components.
|
|
Facial Expression |
Body Language |
Outfit & Styling |
Scene Atmosphere |
Holistic Composition |
Facial Expression |
Pearson Correlation |
1 |
0.565** |
0.434** |
0.252** |
0.571** |
Sig. (2-tailed) |
|
0.000 |
0.000 |
0.000 |
0.000 |
Body Language |
Pearson Correlation |
565** |
1 |
0.475** |
0.304** |
0.622** |
Sig. (2-tailed) |
0.000 |
|
0.000 |
0.000 |
0.000 |
Outfit & Styling |
Pearson Correlation |
0.434** |
0.475** |
1 |
0.458** |
0.624** |
Sig. (2-tailed) |
0.000 |
0.000 |
|
0.000 |
0.000 |
Scene Atmosphere |
Pearson Correlation |
0.252** |
0.304** |
0.458** |
1 |
0.540** |
Sig. (2-tailed) |
0.000 |
0.000 |
0.000 |
|
0.000 |
Holistic Composition |
Pearson Correlation |
0.571** |
0.622** |
0.624** |
0.540** |
1 |
Sig. (2-tailed) |
0.000 |
0.000 |
0.000 |
0.000 |
|
Ranking |
3 |
2 |
1 |
4 |
0 |
**Correlation is significant at the 0.01 level (2-tailed). N = 2160 (20 images × 108 subjects).
4. Discussion
4.1. Completeness of AI Prompts
To evaluate the completeness of Modified Prompt, this study utilized a survey methodology, coupled with quantitative data analysis techniques. Detailed quantitative analyses of each metric are presented in Table 8. A questionnaire was designed using a 5-point Likert scale, where higher scores indicated a more favorable evaluation of prompt completeness by respondents. After calculating the mean scores from the survey data, results indicated that the completeness ratings for all 10 portrait prompts exceeded a score of 3.00, with the highest average score being 4.22 and the lowest being 3.69. These findings suggest that respondents generally held a positive view of the prompts, perceiving them as highly complete in their descriptive content. This positive assessment reflects the clarity and adequacy of the prompts in guiding the portrayal of the artworks, indicating that expert-designed prompts provide a level of reference value and reliability in conveying the content of the portraits.
4.2. Accuracy of AI Prompts
To comprehensively assess the accuracy of Modified Prompt, this study implemented a structured questionnaire survey, employing a 5-point rating scale. The evaluation targeted four essential components: Facial Expression, Body Language, Outfit & Styling, and Scene Atmosphere. Higher scores indicated a more favorable evaluation by respondents of the prompt’s accuracy in representing each specific component. Calculated as average scores, the survey results revealed that each component consistently scored above 3.00, indicating a generally positive appraisal of the prompts’ accuracy.
Table 8. Completeness and accuracy of AI prompts (II).
No. |
Portrait |
Accuracy |
Completeness |
Facial Expression |
Body Language |
Outfit & Styling |
Scene Atmosphere |
Holistic Composition |
1 |
Dancer in the Moonlight |
3.69 |
3.94 |
3.61 |
4.03 |
3.69 |
2 |
Figure in Red Dress |
3.92 |
3.22 |
4.44 |
4.25 |
4.22 |
3 |
Melancholy |
3.89 |
3.92 |
4.19 |
3.97 |
4.22 |
4 |
Personality |
3.53 |
4.33 |
4.06 |
4.19 |
4.17 |
5 |
Love and Hatred |
3.86 |
4.11 |
4.28 |
3.89 |
4.00 |
6 |
Frosting |
4.36 |
3.69 |
4.39 |
4.36 |
4.17 |
7 |
Staring |
4.14 |
4.36 |
4.39 |
3.89 |
4.17 |
8 |
Noble |
4.06 |
4.33 |
4.19 |
4.11 |
4.19 |
9 |
Young Lady |
4.00 |
4.11 |
4.42 |
4.06 |
4.08 |
10 |
Lifetime Colon |
3.22 |
3.89 |
4.08 |
4.19 |
3.89 |
|
Average |
3.87 |
3.99 |
4.21 |
4.09 |
4.08 |
Specifically, within the Facial Expression category, all 10 portraits scored above 3.00, with No.6 “Frosting” achieving the highest score of 4.36 and No.10 “Lifetime Colon” the lowest at 3.22. In the Body Language category, with No.7 “Staring” rated highest at 4.36 and No.2 “Figure in Red Dress” lowest at 3.22. Regarding Outfit & Styling, with No.2 “Figure in Red Dress,” achieving the top rating of 4.44, while No.1 “Dancer in the Moonlight” received the lowest score at 3.61. Finally, in the Scene Atmosphere evaluation, the average scores for all 10 portraits were also above 3.00, with No.6 “Frosting” achieving the highest rating at 4.36, and both No.5 “Love and Hatred” and No.7 “Staring” tying for the lowest rating at 3.89.
Based on the observations of the overall data from the analysis of 10 portraits in Table 8, among the four essential components, Outfit & Styling demonstrates higher accuracy, indicating that the Modified Prompt achieves greater consistency with the original image in this aspect compared to other components. In contrast, Facial Expression showed significantly lower accuracy, suggesting that there remains room for improvement in the accuracy of Modified Prompt in describing Facial Expression.
4.3. Similarity of AI-Generated Images
To evaluate the similarity between AI-generated images and their original counterparts, this study conducted a survey-based questionnaire. A 5-point scale was used, with higher scores indicating a greater perceived similarity between the AI-generated images and the originals. The average scores derived from the survey responses are presented in Table 9.
This study conducted a comparative analysis of System A and System B by inputting 10 portrait artworks into each system and evaluating the 20 AI-generated images produced. The assessment focused on the degree of similarity between the AI-generated images and the original portraits across various essential components. In terms of Holistic Composition, the images generated by System B were generally perceived as closer to the originals. However, for Artwork No.4 “Personality,” the outputs from System A were evaluated as more similar to the original. A more detailed investigation into specific components revealed that in aspects such as Outfit & Styling and Scene Atmosphere, System B consistently achieved lower similarity scores compared to System A. Although the overall compositions of the AI-generated images from both systems were comparable, System A demonstrated greater alignment with the originals in terms of clothing details and scene color schemes. These differences significantly influenced the overall evaluation results.
Table 9. Survey results on the similarity of AI-generated images.
No. |
Portrait |
Facial Expression |
Body Language |
Outfit & Styling |
Scene Atmosphere |
Holistic Composition |
A |
B |
A |
B |
A |
B |
A |
B |
A |
B |
1 |
Dancer in the Moonlight |
2.94 |
3.81 |
3.33 |
4.03 |
3.11 |
3.58 |
2.78 |
2.94 |
3.14 |
3.56 |
2 |
Figure in Red Dress |
2.89 |
3.11 |
2.36 |
4.36 |
4.00 |
4.08 |
3.67 |
3.53 |
3.08 |
3.75 |
3 |
Melancholy |
2.64 |
4.28 |
2.86 |
4.31 |
3.08 |
3.00 |
3.72 |
3.17 |
3.00 |
3.69 |
4 |
Personality |
3.42 |
3.97 |
3.64 |
4.03 |
3.86 |
2.44 |
3.61 |
2.39 |
3.61 |
3.19 |
5 |
Love and Hatred |
2.81 |
4.22 |
3.72 |
4.28 |
3.11 |
3.69 |
3.56 |
2.97 |
3.03 |
3.61 |
6 |
Frosting |
3.78 |
4.25 |
3.50 |
4.42 |
3.81 |
4.47 |
4.03 |
3.47 |
3.67 |
4.17 |
7 |
Staring |
2.86 |
3.94 |
3.03 |
4.17 |
3.28 |
3.97 |
3.75 |
3.00 |
3.14 |
3.67 |
8 |
Noble |
3.61 |
4.08 |
3.19 |
4.44 |
3.28 |
4.36 |
3.83 |
3.17 |
3.42 |
4.03 |
9 |
Young Lady |
3.58 |
4.25 |
3.50 |
4.19 |
3.69 |
4.11 |
3.47 |
3.97 |
3.39 |
3.94 |
10 |
Lifetime Colon |
2.69 |
2.97 |
3.19 |
4.31 |
2.81 |
3.78 |
3.06 |
3.47 |
2.64 |
3.42 |
Similarly, for Artwork No.3 “Melancholy”, the images generated by System A demonstrated higher similarity in Outfit & Styling and Scene Atmosphere. However, the Holistic Composition evaluations indicated that the outputs from System B were more closely aligned with the original. While System A displayed greater consistency in color coordination with the original artwork, System B’s rendering of figure posture, a visually dominant component of the composition, was more accurate, ultimately contributing to its higher ratings in Holistic Composition.
Beyond the two aforementioned works, the AI-generated images of the remaining five portraits (No.2 “Figure in Red Dress”, No.5 “Love and Hatred”, No.6 “Frosting”, No.7 “Staring”, and No.8 “Noble”) demonstrated that although System B consistently achieved superior performance in Holistic Composition, System A achieved higher similarity scores in Scene Atmosphere. As summarized in Table 10, the aggregated similarity scores across the four essential components reveal that only for No.4 “Personality” did System A (14.53) surpass System B (12.83). This observation aligns with the comparative results of Holistic Composition, highlighting the interplay of the four essential components in shaping overall similarity. Nevertheless, reliance on a single component is inadequate for evaluating overall similarity, necessitating a comprehensive assessment that incorporates all four essential components.
Table 10. The aggregate score of four essential components.
No. |
Portrait |
A |
B |
No. |
Portrait |
A |
B |
1 |
Dancer in the Moonlight |
12.17 |
14.36 |
6 |
Frosting |
15.11 |
16.61 |
2 |
Figure in Red Dress |
12.92 |
15.08 |
7 |
Staring |
12.92 |
15.08 |
3 |
Melancholy |
12.31 |
14.75 |
8 |
Noble |
13.92 |
16.06 |
4 |
Personality |
14.53 |
12.83 |
9 |
Young Lady |
14.25 |
16.53 |
5 |
Love and Hatred |
13.19 |
15.17 |
10 |
Lifetime Colon |
11.75 |
14.53 |
Continuing from Table 9, the four essential components include only Facial Expression, Body Language, Outfit & Styling, and Scene Atmosphere.
Table 11 provides a comparative summary of the similarity rankings of the two AI systems across the four essential components. While System A exhibited marginally lower performance than System B in Holistic Composition, it demonstrated distinct advantages in capturing the Scene Atmosphere of the imagery. In contrast, System B excelled in rendering human figures in portraiture, particularly in depicting posture, physical gestures, and facial expressions. Given that the artworks selected for this study are portraitures, the accurate depiction of the human figure emerges as a critical factor, contributing to System B’s closer alignment with the originals. However, a shift in genre to landscapes or other art forms could potentially alter the comparative strengths of these AI systems. The interpretive distinctions observed between the two systems in art appreciation reflect their differing emphases on specific components. This phenomenon underscores the subjectivity inherent in art appreciation, where individual perspectives and priorities shape interpretations of artistic value and aesthetic experience.
Table 11. The Similarity ranking between holistic composition and the four essential components.
Ranking |
1 |
2 |
3 |
4 |
5 |
System A Average |
Scene Atmosphere |
Outfit & Styling |
Body Language |
Holistic Composition |
Facial Expression |
3.55 |
3.40 |
3.23 |
3.21 |
3.12 |
System B Average |
Body Language |
Facial Expression |
Outfit & Styling |
Holistic Composition |
Scene Atmosphere |
4.25 |
3.89 |
3.75 |
3.70 |
3.21 |
The values represent the mean scores of the 10 portrait paintings.
5. Conclusion
This study systematically reconstructs the process of art appreciation by integrating advanced AI technologies. Utilizing AI’s capabilities in image recognition, analysis, and generation, the research provides viewers with a diverse array of auxiliary tools, fostering deeper interpretation and nuanced understanding of artistic works. The findings reveal that System A exhibits exceptional precision in both the completeness and accuracy of prompt generation, effectively capturing and articulating the formal characteristics of portraiture. This capability significantly enhances viewers’ aesthetic perception by accurately conveying the morphological language of portraiture and improving their ability to analyze visual composition. Through the proposed art appreciation framework, viewers are empowered to deconstruct portraiture and, by leveraging AI-based image generation techniques, interpret artworks from multifaceted perspectives. This assisted approach not only enhances the precision and diversity of art appreciation but also introduces innovative methodologies into art education and learning processes. Additionally, it deepens viewers’ emotional engagement with artworks, providing valuable insights for future applications and practices in art education.
Traditional art appreciation often requires viewers to possess foundational knowledge of art, which can present significant challenges for beginners. This study proposes an innovative AI-assisted framework for analyzing and interpreting artworks, redefining traditional approaches rooted in intuitive observation and personal interpretation. First, AI leverages deep learning models to analyze extensive datasets of artworks, systematically quantifying and refining elements such as composition, color, and technique. This data-driven methodology enables viewers to rapidly comprehend the defining characteristics of various artistic styles. Second, the reconstruction of visual perspectives through AI-generated images introduces novel interpretative pathways. Interactive prompts incorporating semantic explanations and visual analyses support beginners in identifying the fundamental elements of artworks, thereby enhancing their understanding and engagement. The integration of AI technologies not only systematizes and enriches the multi-layered process of art appreciation but also facilitates a more accessible and inclusive model for public art education. By promoting the digitization and popularization of art appreciation, this approach broadens the scope of artistic experiences and fosters deeper connections between viewers and the art world.
AI applications in art education extend beyond knowledge dissemination to emphasize emotional understanding of artworks. Traditional art education relies heavily on teachers’ guidance and students’ subjective experiences to inspire emotional insights. In contrast, AI introduces objective and precise tools for emotional recognition. By employing image recognition and emotion analysis, AI systems assist learners in identifying emotional expressions in artworks, such as the psychological implications of color or the emotive impact of dynamic compositions. These tools illustrate how artists convey emotions through specific techniques, enabling learners to progressively deepen their emotional sensitivity and understanding of artworks. This AI-driven approach transforms art education from a mere transmission of techniques and knowledge into a profound experience of emotional engagement and perception. The AI-assisted framework enables beginners to not only “view” but also “perceive” the structural details of artworks. Through interpretative content generation, it reveals hidden meanings from novel perspectives, further enhancing the appreciation experience and fostering profound emotional connections with the artwork.
The integration of AIGC tools into the art appreciation process demonstrates direct applications in art education, particularly in interactive teaching, personalized learning, and interdisciplinary exploration. Learners can utilize AIGC tools to create portraits, analyzing key aspects such as composition, color, and emotional expression, thereby gradually enhancing their artistic perception. Furthermore, learners can modify their creations based on personal interests, achieving a personalized learning experience. Additionally, the AIGC-based art appreciation framework can be combined with disciplines such as psychology and data science to help learners explore the connections between artistic expression and human emotions, thereby deepening their interdisciplinary understanding of art. Beyond portraiture, this framework is equally applicable to other art forms. In landscape painting, learners can employ AIGC tools to analyze the use of color and compositional ratios in scenes, recreating works with varying seasonal or atmospheric effects during the modification process. In abstract art, learners can explore the relationships between geometric forms, color patterns, and emotions, designing personalized abstract artworks. In sculpture, AIGC aids in analyzing the influence of form, light, and material properties on perception, inspiring learners to create unique sculptural designs. In multimedia art, AIGC integrates visual, auditory, and interactive elements, enabling learners to collaboratively produce comprehensive artworks with multisensory experiences.
The application of AIGC tools in art education transcends traditional teaching methodologies, offering learners innovative and modernized learning pathways and promoting the popularization and diversification of art education. Despite the significant potential of AI in art appreciation, its application is still hindered by technical limitations and ethical considerations. When engaging in art appreciation, it is crucial to ensure that AI-generated content respects copyright and cultural appropriateness. Transparent data review mechanisms should be implemented to prevent intellectual property violations or misrepresentation of cultural values, while integrating multicultural semiotics and emotion recognition technologies to foster inclusive interpretations. In the context of art education, curricula should strengthen awareness of AI ethics, promote critical thinking, and institute robust oversight mechanisms involving artists, educators, and ethicists to regulate AI applications. Finally, to further enhance the value of AI in art education, future developments should integrate emotion recognition, cultural semiotics, and human psychology to improve AI’s cultural adaptability and emotional sensitivity, providing deeper assistance in art appreciation. Such developments will not only deepen the role of AI in art education but also provide learners with enriched aesthetic experiences and a more profound emotional understanding, driving innovation and fostering sustainable development in the field of art education.
Acknowledgements
Special thanks are extended to the scholars from the Graduate School of Creative Industry Design at the National Taiwan University of Arts and the group of experts who participated in the experiments for their invaluable assistance and constructive suggestions in this article.