The Construction of Sichuan Image under Multimodal Visual Grammar—Taking the Documentary “Aerial China (Sichuan)” as an Example

Based on the visual grammar raised by Gunther Kress and Theo van Leeuwen, this paper tasks visual text analysis as the research method, and explores the dynamic image construction of Sichuan Province from three dimensions of “Sichuan characteristics”, “representative meaning”, “interactive meaning” and “compositional meaning” by analyzing the image modals in the national geographic documentary “Aerial China (Sichuan)”.


Introduction
What is realized in today's world is multi-modal presentation, in which the masses integrate symbolic resources to realize communication and exchange.
Multi-modal discourse integrates a variety of social symbols to deepen the visual, auditory, tactile and other sensory experiences. As a typical multi-modal discourse, documentary is composed of language, image, sound and other modalities. It effectively combines text and image modalities to bring rich sensory experience to the audience and form a strong visual memory in them.
The documentary "Aerial China" produced by CCTV uses a full aerial image narrative to show China's beautiful natural landscapes and colorful ecological environment, highlighting the brilliant achievements of China's economic construction and presenting China's image from a unique bird's-eye perspective. It's

Research Status of Multimodal Discourse Analysis
Since the 1990s, the multimodal turn of functional linguistics and discourse analysis has made multimodal discourse analysis one of the hot topics in linguistics and communication research. The theoretical basis is Halliday's functional theory of linguistic system. Social symbolists have established semantic grammar systems to describe symbols such as visual images (Kress & van Leeuwen, 1996).
In "Reading Images: The Grammar of Visual Design (Narrative Representation)", Kress and van Leeuwen (1996) focus on the three main functions of language in system functional linguistics and divide the multimodal meaning system into representation, interaction and construction, which corresponds to the concept, interpersonal and group meaning of the language, summing up the framework of visual grammar analysis and developing social semiotics based on visual grammar. In "Visual Analysis Manual", Leewen & Jewitt (2001), introduce visual analysis in anthropology, cultural studies, psychoanalysis, ethnological studies, and film and television studies, as well as image content analysis and socio semiotics analysis.
International studies show new trends, reflected in the enhanced interdisciplinary multimodal study, increasingly mature critical discourse analysis, and the interpretation of non-linguistic symbols based on cognitive theory, library analysis or psychological empirical research.
Multimodal research has also made some progress in China. Since Li Zhanzi (2003) in "Social Semiotic Analysis of Multimodal Discourse" first introduced Kress and van Leeuwen's multimodal visual grammar theory to China, Chinese scholars have carried out extensive and in-depth discussion of on advertising discourse, news reports, television movies, classroom teaching, natural conversation and other multimodal forms from the system functional visual grammar, multimodal metaphor, multimodal corpora, multi-reading and writing and other perspectives.
As far as research methods are concerned, researchers have begun to use digital technology to annotate and simulate complex multimodal texts establish multimodal corpus and develop multimodal retrieval software (Baldry & Thibault, 2008;Gu 2006). Meanwhile, empirical research on viewers' cognition of multimodal discourse has been gradually rising through questionnaire survey, eye movement experiment and even brain imaging technology (Gidlof et al., 2012;Muller et al., 2012).
In terms of application, relevant researches focus on static media such as print ads, political comics, posters, foreign language teaching materials, photographic images, and dynamic media such as TV advertisements, films and propaganda there is a little research on the meaning structure of the national geographic image documentaries. In this paper, we will take the single-frame image as the analysis unit to integrate the image and discourse meaning resources in the anthology of "Aerial China (Sichuan)" under the framework of visual grammar from the shooting techniques and shooting angles, exploring the significance of multimodal documentary discourse presentation.

Multimodal Discourse Analysis of Documentaries
According to Halliday's (1978) social semiotics theory, both linguistic and non-linguistic symbols cannot be regarded as invariable semantic codes, but as resources constructing meanings in certain contexts. Therefore, the visual "grammar" of Kress & Van Leeuwen (1996) is only a systematic description of image ideographic resources, rather than a rigid rule. Kress (2010) defines modality as "the symbol resources that create meanings in social culture" and believes that "any modality (such as image, gesture, music) is a complete ideographic system which contains expression plane, lexicogrammar and discourse semantics like language".

Analysis of the Representational Meaning
Representational meaning is the foundation of multimodal discourse construction, corresponding to the conceptual function of Halliday's language metafunction. The representational meaning of image is composed of two parts: narrative representation and conceptual representation. Symbol resources can objectively feed back the modalities of the real world and the relationships between things. In the visual grammar, the representation modal presents the unfolding actions, the process of events and the transformative spatial arrangement.
The narrative representation includes action, reaction, speech and mental process.
In the course of action, elements form diagonal lines, and usually strong diagonal lines form vectors. The narrative vector undertakes the interaction of single or multiple participants, and is also an intermediary between the constituent elements. In this process, the actor sends out a vector signal and participates in it, which highlights the actor's status. When participants are connected by the vector, it means that they do something for others or each another.
The vector component element of the reaction process is the gaze of the participants. The gaze vector has certain directionality. The gaze of the active participant points to that of another participant. The former is referred to as "reactor" and the latter "phenomenon".
The documentary "Aerial China (Sichuan)" shows humanistic Sichuan through narrative representation. The face-changing performance is a major feature of Chengdu's scenic spots, and also a major business card of Sichuan culture. The narrative meaning of Figure 1 is that "face-changing performance is loved by the audience", echoing the language narrative that "Sichuan opera is the 'Sichuan cuisine' in Chinese opera. Face-changing is a unique skill in Sichuan opera. The faces of joy, anger, sadness and happiness can be switched instantly". The close-up shooting of face-changing performances and fire-spraying and the representation of the audience's reaction constitute the intransitive action process of the performer and the reaction process of the audience. The audience sitting under the stage and watching the face-changing performance on the stage constitutes the vector of the reaction process in the narrative representation ("audience watching the performance"), and the "watching" process is

Interpretation of the Interactive Meaning
Interactive representation corresponds to the interpersonal function of Halliday's  The reference standard of attitude dimension is "view angle", which expresses the meaning of "intervention" and "power" through five "view angles" in frontal and vertical directions. In the horizontal direction, it is divided into frontal angel and oblique angle to determine the degree of "intervention" in the image. In the vertical direction, it is divided into three dimensions: "low angle", "level angle" and "high angle" to determine the attitude of the image viewer towards the image participants.
In the documentary visual text, the lens is dynamic, and the horizontal angle  Pandas run after their nurses, which is the action process of the narrative representation meaning. The frontal angle makes the viewer involved in the same scene of playing with the panda, forming a connection between, the viewer and the panda and narrowing the distance between the viewer and the cute panda in the film, thus making the viewer immersed in the image. The representation of panda life in the Wolong Conservation Park shows the significance of the national treasure panda being taken good care of by human beings in the protected area, and shows Sichuan's efforts in giant panda protection and research and ecological civilization construction. Sichuan also has many famous tourist attractions. As the tallest stone Buddha stature in the world, the Leshan Giant Buddha connotes Chinese Buddhist culture. In the documentary, far shooting, low-angle shooting and close-up techniques are used to capture the Leshan Giant Buddha. In Figure 7, the Giant Buddha the largest Maitreya Buddha in the world, with its head in line with the top of the mountain, feet on the river, hands on the knees as well as a well-proportioned body and solemn expression-"It is 71 meters high, with 6-meter-long ears, 3.2-meter-long nose, 2.46-meter-wide left eye and 2.45-meter-wide right eye." The close-up of the giant Buddha's head enables the audience to have a close look at the Buddha. The combination of far and close lens enhances the audience's sense of interaction and realizes the interaction between images and viewers. The low-angle lens from the bottom to the up presents the Giant Buddha to the viewer. At this time, the image of the Buddha is awe-inspiring, while the viewer is in a weak position, highlighting the Buddha in the composition.
The Leshan Giant Buddha is a rare stone sculpture, and because of the cultural cognition of Buddhism and the natural awe of the Buddha statue in the traditional Chinese thought, the effect of the low-angle shooting resonates with the inner cognition of the viewer.
The magnificent but precipitous Sichuan scenery is a result of hundreds of millions of years of geological movement. As a natural landscape, the precipitous   Among them, salience is to attract the attention of viewers through a series of methods. In the documentary visual text, the object is emphasized through the relative size of the image participants, sharpness of the picture focus, color contrast, visual position, perspective angle and special cultural symbols.

Interpretation of the Compositional Meaning
In the documentary, the color contrast and the top shot are combined to connect the representation meaning and the interaction meaning to present the composition meaning. Figure 9 shows the chili red of Pixian thick broad-bean  documentary uses the technique of color contrast to highlight the color characteristics of the images, which is the highlight application in the compositional meaning. Huanglong Scenic and Historic Interest Area is a world-class calcified landscape. Viewed from the sky, the natural calcified beach is majestic with golden water flow, "looking like a long golden dragon" (Figure 10). The lens moves from left to right, captures the calcified beach and finally presents the panorama of it, which is in line with people's visual perception.
In Figure 11, the lens of Scenic Spot of Daocheng Yading uses color contrast.
The golden meadow and colorful forest are interlaced, which attracts the atten-

Conclusion
In this paper, the documentary is a complete dynamic multimodality discourse.
The image modal and commentary text modal are regarded as resources with ideographic function. Visual grammar is used to analyze the integration of   "narrative meaning", "representation meaning" and "interactive meaning" in multiple scenes of the documentary, which interprets the overall narrative significance and image of "Aerial China (Sichuan)". This paper is intended to enrich the perspective of multimodal discourse analysis and enhance the interdisciplinary nature of multimodal research, that is, to analyze documentaries with linguistic and semiotic knowledge and film and television media knowledge, and to investigate the significance of continuous visual narrative discourse and provide the multimodal research with certain inspiration by taking the documentary as a whole and complete multimodal discourse, combining pictures with commentary and breaking through the limitation of single image.