Nonverbal Communication and Emojis Usage in Arabic Tweets: A Cross-Cultural Study

People spend most of their time communicating their thoughts, ideas, attitudes, and emotions on social media platforms like Twitter, however, an important mode of communication as the nonverbal component which requires visual and audible cues is not allowed due to the nature of these text-based platforms. The aim of this research is to discover the alternative ways Arabs use across different dialects to compensate for the absence of the nonverbal component. To be able to discover that the researchers collected a corpus of tweets written in the Arabic language by using python through the Twitter application programming interface (API). The results can be summed up as follows: emojis helped Arabs to communicate their facial expressions and the top used emoji across the different dialects was Face with Tears of Joy, it was also apparent that the top used emojis reflected the universal emotions, regarding the usage of hand gestures, Egyptian dialect came in the first place and Emirati dialect in the second place. Prosodic features such as the tone and loudness of the voice are expressed by the mean of character repetition, Punctuation usage across the Arabic dialects was limited, and Lebanese seemed to use them the most, Arabs tend to replace punctuation marks with emojis, finally, Arabs used vocal expressions like Interjections to communicate their affective state.


Introduction
Non-verbal communication is a system involving an assortment of features often used together to support expression [1]. The field of examining nonverbal percentages, it is apparent how humans cannot communicate ideas, feelings, and attitudes efficiently without relying on nonverbal cues, however, on Text-based computer-mediated communication as social media platforms communicators only communicate through text which poses a challenge for them to express the nonverbal cues while delivering their messages. One of the situations that influenced the researchers to study the representations of the nonverbal cues on Twitter is the misunderstanding that occurs between people while texting, for example, people could complain that one of the texters is rising his/her voice when one character is repeated inside the word or when an emoji does not fit the context although the verbal message is written correctly and that reflects the importance of the nonverbal cues in written messages and how it is more strong than the verbally written messages.
The following research tries to answer the following questions. How Arabs across different dialects compensate for the absence of nonverbal cues on Twitter? Does Twitter limit and restrict the use of nonverbal cues? How the nonverbal components as facial expressions, paralanguage, and prosodic features are expressed by the means of emojis, character repetition, punctuations, and interjections?

Related Work
Few works had been done on examining nonverbal cues across text-based computer-mediated communication especially on social networking platforms like Facebook and Twitter, however, in the following years with the increasing num- Tantawi and Rosson [8] investigated the paralinguistic function of emojis on Twitter, 1600 tweets were examined. The analysis conducted on the collected tweets was divided into two main aspects; the first aspect is topic analysis to find out the most common topics on Twitter and the second aspect focused on finding out the emojis function, the tweets they analyzed contained three paralinguistic features for emojis; attitude, gesture, and topic. Their results showed that emojis are primarily used to signal attitudes and emotions. They also discussed implications for the design of Twitter and other text-based communication tools.
Álvarez and Muñoz study [9] has twofold goals; the first one is to describe tweets on Twitter from a discourse contrastive point of view and to study several numbers of distinguishing features of language use that are specific to this social network in Spanish and English [9]. One of the most important findings of the study is how the character limitation that Twitter force upon their users did not cause communication failures on the contrary they were able to communicate themselves very well.

Methodology of the Study
To reach the research goals the following stages have been applied: the first stage was concerned with data collection, the second stage focused on data cleaning and normalization, the third stage is the data analysis which has been done by using python scripts to get the corpus pure structure information and measurements for the nonverbal devices categories which are presented in Table 1. In

Data Sample
The data were collected by using Python through Twitter API, since the main study purpose is to examine the nonverbal cues and usage of emojis, the researchers collected the tweets by using 91 emojis that represent facial expressions and 20 hand gestures emojis. For each emoji, 100 tweets were collected but for very few numbers of emojis, less than 100 tweets were collected as usage of those emojis is not widespread. Totally more than 10,000 were obtained. The Arabic language was specified in the search query so only Arabic tweets were collected.

Data Cleaning and Normalization
The collected corpus from the API had duplicated tweets and sometimes Urdu language tweets were obtained due to the borrowed letters the language has from Arabic alphabet, as a result those duplicated and mistakenly collected tweets were removed.
Regarding the tweets which have mentioned location there was a challenging task since Twitter users could use multiple alternative names to refer to the same country or use the city which they live without referring to the country, this kind of unbalancing data would make further analysis problematic and difficult to work on. The researchers decided to normalize the location for all tweets, so each country has only one name shape which refers to it, for example, "egy", "Egypt, Alex" and "Alexandria" are normalized to "Egypt". All the data cleaning and normalization were done by using simple techniques in Excel spreadsheets like removing duplicates for removing the repeated tweets.

Data Analysis and Results
After data cleaning and normalization, the analysis was performed on two main levels; the first level focused on examining the emojis usage in Arabic language and across different dialects and that was analyzed with the help of custom py-   Table 2 above presents the pure structure information for the whole corpus after data cleaning and normalization and Table 3 Shows pure structure information among each dialect. From the results above we may infer that character limitation which Twitter imposes upon their users is not problematic for Arabic native speakers as the average tweet length is 69 characters with a total of 8 words. It also seems that Saudi and Lebanese dialects use more characters than other dialects. We can relate the reason behind the highest number of characters per tweet that Lebanese use with the fact that they use punctuation marks more than other dialects, this is apparent in Table 8 below.

Emojis
Emojis are graphic representations of facial expressions, body language and hand gestures, emojis are one of the most important nonverbal devices that text-based communicators use to signal emotions and attitudes. People sometimes use emojis and emoticons interchangeably, however, the two terms have different implications; Emoticons refer to a series of text characters (punctuation or symbols) that are utilized to textually form a gesture or facial expression [10] while emojis refers to the graphical icons that appear on keyboards especially on mobile texting applications and are used directly without the need to use any textual characters. Emojis help to feel the mood of a chat and the tone of a relationship. With emoji experience, people may infer about the things that are not expressed concretely [11]. In this study only emojis were examined for Arabic native speakers across dialects as they are more widespread over the social media platforms and more convenient for the study purpose. Emojis on Twitter always counted as two characters. Table 4 and Table 5

Paralanguage and Prosodic Features
One of the most important nonverbal cues after facial expressions that have the power to change the meaning of words and reflect the speaker's emotional state, whether an utterance is a statement, question, or command and whether the emphasis or focus on a specific idea is the nonverbal cues of the voice as tone pitch and accent. When Twitter users wanted to emphasize and show the intensification of their emotions, they used character repetition to compensate for the absence of prosodic features. In prosodic terms the repetition of letter characters would seem to correspond to the duration feature and the capitalization to the feature loudness [12]. Since Arabic lacks the feature of capitalization both duration and loudness were expressed by the means of character repetition. Character repetition happen in vowels more than consonants and it also happens with emojis. Table 6 shows examples from Arabic dialects and as it is clear no dialect lacked the feature of character repetition to express nonverbal prosodic features.
Punctuation not only conveys a great deal about grammatical structure, but also compensates for the prosody and paralinguistic features of speech which are absent in written communication [13]. Arabs usage for punctuation marks is very limited especially for the full stop, comma is the most used mark and the exclamation mark comes in second place followed by the question mark. It seems like emojis are replacing punctuation marks, for example, exclamation mark in some tweets is replaced by or accompanied with Frowning Face with Open Mouth (e.g., ‫ﺑﺘﺲ‬ ‫ﻣﻦ‬ ‫اﺣﺴﻦ‬ ‫اﺣﺪ‬ ‫ﯾﺠﻲ‬ ‫ﻣﻌﻘﻮل‬ ‫وﻗﺘﻨﺎ‬ ‫ﯾﺮوح‬ ‫ﻣﻦ‬ ‫ﯾﺎﺗﺮا‬ which translated as "I wonder as time passes is it possible that someone will come better than BTS" ). Question mark is also replaced by or accompanied with Thinking Face (e.g., ‫اﻟﺘﻔﺮﻗﺔ‬ ‫ﻟﯿﺶ‬ which translated as "why would you make a difference"). Lebanese dialect uses more punctuation marks than the other dialects and Emirati dialect nearly never uses them. But if we compared the Emirati usage for emojis we will find that it uses more emojis and that may replace the usage of punctuation marks. People feel very comfortable while communicating on social media platforms and they do not concern themselves with the grammatical structure of their sentences and that can be shown in the limited percent of punctuation marks per tweet and when it is used the main reason is to convey and express their emotions and attitudes. Table 7 and Table 8 show punctuation marks used in the whole corpus and among different dialects.

Interjections
Interjections are vocal expression of the emotions, Interjections are sound sequences, words, typical phrases, or clauses which can be realized as utterances signaled in speech by being produced with greater intensity, stress, and pitch, and as sentences in writing by an exclamation mark [14]. In Arabic, interjections are of two types: nouns of sounds (Asmaa' Al-Aswaat) and nouns of verbs (Asmaa' Al-Af < aal) [15]. Some researchers classify the interjections into three main classes which reflect the speaker's mental state or act. The classification includes three classes: emotive, volitive, and cognitive interjections (Wierzbicka 1992 as cited in [15] Table 9. Examples for tweets contain interjections found in the corpus.

Interjection Example Meaning
‫"واو"‬ Transilated from english interjection "wow" which used as an exclamation of surprise ‫"ﯾﻊ"‬ Equivalent to English interjection "yuk" which used for expressing disgust and unpleasing.
‫"آخ"‬ Equivalent to English interjection "ouch" which used to express sudden pain and discomfort.
‫"ﺻﮫ"‬ Equivalent to English interjection "shh" which used to give someone order to be quiet speaker's emotive state and some of the used interjections are transliteration from the English interjections. nonverbally without the need of using long written messages, so instead of writing "It is a disgusting thing" they can write "Yuk." instead and still the same message delivered.

Conclusion
Text-based computer-mediated communication social media platforms like Twitter do not limit and restrict the use of nonverbal cues, on the contrary, Twitter users find creative ways to show and express the nonverbal component whether by using emojis to represent their facial expressions, punctuation, and character repetition to reflect the paralinguistic prosodic features or even by using vocal expressions as Interjections to express their emotional state. Talking specifically about Arabs it seems that there is a little difference in using emojis, but when it comes to gestures significant differences were found. The Face with Tears of Joy came in the first place across all dialects as the most used emoji. Egyptians seem to use hand gestures more than other dialects, Emirati and Lebanese dialects use them less frequently. The number of emojis per tweet is nearly the same across dialects with an average of 3 emojis per tweet. Arabs rarely use punctuation marks, and it was apparent that emojis are replacing them. The top used punctuation mark was the comma followed by the exclamation mark. Lebanese dialect uses punctuation marks with a higher percentage than other dialects and that is reflected in their higher number of characters per tweet. Character repetition was pervasive across all dialects which reflects intensifying emotions and shows words that are pronounced louder than other words.
Arabs use interjections to reflect their emotional state instead of writing a full sentence, only a vocal expression is written.