Social Networks, Politics and Public Views: An Analysis of the Term “Macedonia” in Twitter

In this paper we deal with Twitter and the presence of the keyword “Macedonia” in tweets over a period of time. We searched for the same term in three different languages, i.e. “Μακεδονία”, “Macedonia” and “Македонска -Македонија”, since we are primarily interested in views from Greece and FYROM without excluding views from other regions. We use methods from Social Network Analysis (SNA) in order to create networks of users, calculate some main network metrics, measure user importance and investigate the presence of possible fragmentations—communities among them. We fur-thermore proceed to a form of content analysis, using pairs of words within tweets, in order to obtain main ideas, trends and public views that circulated over the network.


Introduction
According to Filippas [1], nowadays new ideas regarding mainly psychological factors in financial decisions together with innovation, fast decision making or even philosophical views about growth and political systems are playing a very important role and most certainly integrate the classical economic view. One serious aspect in this view is the ways that information regarding financial or political news spreads around and influences decision makers in all levels of an organization or even in private decisions. The number of people who have access to large amount of information has dramatically increased while beliefs and emotions are widely spread by many people through the globalization of media, the rapid growth of Internet and the progress in mobile communications. Social Networking Filtering the available information, if done in "good will", may in principle be helpful in any decision process; however, "good will" is often challenged and allegations about deliberate misinformation with the intend to mislead public opinion have become common place. "Fake news" or "post-truth politics" [2] have been recently introduced in an attempt to explain different ways social media use private data in order to filter opinions, views or news, resulting to largely fragmented information and high possibilities to consider "lies for truths".
One of the most controversial topics in Greek politics and public discussion has to do with the "Macedonian Issue". This geopolitical problem, dealing with the geographical region of Macedonia, emerged after the Berlin Treaty on the 19 th century between Greece, Bulgaria, Serbia and the Ottoman empire, was supposedly closed after the Balkan wars and the end of the First World War but was re-opened after the Second World War between Greece, Yugoslavia (at first) and Bulgaria, and later on mainly between Greece and FYROM (Former Yugoslav Republic of Macedonia). One of the thorniest issues in this problem has to do with the name of the State of FYROM, together with the use of term "Macedonian" as a nationality or recognized language. After decades of dispute, a preliminary treaty (a Memorandum of Understanding) has been prepared between Greece and FYROM and this treaty was signed by the two counties' prime ministers on the 20th of June, 2018 in the border lake of Prespes. This treaty will have to be validated through a series of steps, including a referendum in FYROM (on September, 2018) and subsequent ratification by vote in both countries Parliaments.
The purpose of this paper is not to discuss the actual dispute from its historical or diplomatic view, or even take a positive or negative position on the treaty, but to investigate its presence in the Social Networking sphere and especially in Twitter. Twitter is an American online news and social networking service, on which users post and interact with messages known as "tweets". Tweets were originally restricted to 140 characters, but on November 7, 2017, this limit was doubled for almost all languages. Registered users can post tweets, but those who are unregistered can only read them. Users access Twitter through its website interface, Short Message Service (SMS) or mobile-device application software ("app"). In a number of recent papers [3] [4] twitter data regarding political views have been extracted and processed, in order to retrieve information regarding political agendas or other issues of great local or even global interest.
Probably, the most appropriate way to investigate Twitter's influence is Social Network Analysis (SNA).According to Freeman [5] and Wasserman and Faust [6] in SNA a social structure is formed by patterns or regularities of relations which develop between interacting units. Typical social research focuses on characteristics and attributes of single units-persons, while SNA focuses on relations and interactions between the acting subjects [6] [7] [8] [9]. SNA is now a fully deployed field, containing a number of theories, techniques, metrics etc., giving great insight in social relations. In the following section of this paper we discuss the processes of data collection together with their limitations, network formation and visualization. In order to form a complete view, we create three different networks searching Twitter for keywords in three different languages. We also produce and discuss some general, macroscopic metrics related to our networks and describe some processes used in order to check and filter-out possibly non-interesting tweets.
In Section 3 we calculate and discuss clustering in communities based mainly on the actual content of the tweets. We also repeat the above calculations in an overall merged network where we also seek interconnections between the three original networks. In section 4 we calculate and discuss the importance of nodes in terms of centrality measurements. Finally, in Section 5 we create three networks of word pairs found within the actual tweets and thus proceed to a form of content analysis. We conclude with our final observations, together with some general insights and possible threads for future research.
This type of research is more qualitative than quantitative. It primarily focuses on relations between actors or word adjacencies and not on computing statistical metrics on persons or words. To our knowledge, this is the first time that such an analysis is taken place, involving different languages and hard international issues.

Data Collection and Networks' Formation
A number of software tools may assist in collecting information, calculating numerical results and provide visualizations of networks. In this paper we use NodeXL [10] in order to import data, create networks, calculate metrics and investigate word adjacencies.
NodeXL incorporate a method to import tweets containing a particular keyword. After this import, networks of users (users are represented by verticesnodes) are created based on the responses (mentioning, retweeting, etc.) a particular tweet attracted. An edge (link) is created every time such a response is made, so duplicate edges may exist. A tweet with no responses at all results in a self-loop. This approach has a number of limitations mainly because of the vast amount of data circulating over Twitter. The date of tweets retrieved ranges between 7 and 10 days from the search-day and the maximum number is set on 10,000 tweets. The procedure stops when one of the two limitations is reached.
In our case we started all our searches on the 21 th of June, 2018, one day after the preliminary Treaty was signed as already mentioned. Since we are interested in all possible views, we searched for keywords in three different languages, that is, in Greek ("Μακεδονία"), in Cyrillic ("Македонска -Македонија") and in English ("Macedonia"). Keywords in non-english character sets are searched after transcribed in their percentage notation.. It is the authors' view that searching for this term should be broad enough to encapsulate all possible views on this disputeHence, in our case, three networks are assembled. In Table 1 we present some macroscopic characteristics on them. large scores in modularity (theoretical maximum is 1), means that the networks are clusterable in rather many groups. Clustering will be the next step in our analysis.

Communities within the Produced Networks
In order to locate different groups of users, who probably discuss different or opposite aspects regarding our search, we now proceed to clustering in communities. A community is a structure within a network that contains more edges between its vertices than those outside of it. We use the Clawset-Newman-Moore algorithm [11] to calculate communities in the all three networks. In Table 2 we show the most important macroscopic characteristics of the 5 larger communities in the ENGLISH (filtered), GREEK and FYROM networks respectively. Again, from Table 2, the most striking result is that the larger groups in the first two networks are networks of isolates-(self-loops), with zero densities and average shortest paths. The exception is the FYROM network, which seems to be much more active and interconnected.
This result is again extremely interesting, since it implies that Twitter users in FYROM (or anywhere else, but using the Cyrillic form of the keyword) are much more active, actually read and respond in tweets, thus creating a real conversation instead of "shouting alone". Still, since geo-location cannot be used, one might consider a very active community abroad (e.g. maybe in Australia or Canada), where Twitter is way more used in everyday life than in FYROM or Greece. It also seems that in the GREEK case a sense of "selfishness" is detected.
In Figures 1-3, we pictorially show the three networks, from where the above discussion is acknowledged. All visualizations were created in NodeXL. The names of the larger communities are also shown on the top left of each group. Within groups, a treemap-style is used.
By simple observation of the three figures, yet another important result occurs. Apart from isolates (self-loops), there seem to be a rather large number of small groupings (2 to 5 persons) who engage in discussions. The number of these groups is particularly larger in the GREEK network. In the ENGLISH (filtered) and FYROM cases, the number of small groups decreases significantly. Furthermore, the pictorial representation of the GREEK network seems to resemble a quite immature network, in sense of connections between its vertices, with respect to its evolving through time, despite the fact that all three networks were created on same dates by default. It looks like a random Erdos-Renyinetwork [12] tending to starting creating connected components, but way behind the other two networks. Actually, even pictorially, the same result occurs here: in Greece, Twitter users are rather very few in absolute numbers and/or do not use Twitter to engage in conversations of this kind.    Still, without a closer inspection to the actual tweets, all observations and results come from calculations and observations on structure. At this point, in order to gain insight to the actual conversation, we proceed in actually reading some of the tweets within groups. The selection of tweets to be examined is straightforward: Tweets that correspond to "central" nodes within groups should bear the main ideas (or maybe: key phrases) discussed in these groups. By "central" we mean nodes that lie in the central area of every group (a more mathematical meaning will be used in the next section). Obviously, ENGLISH (filtered) and GREEK networks were rather easy to inspect, whereas for the FYROM network we turned to Google Translate (hence there might be some biases in the actual content).It is mentioned that, NodeXL provides the ability to inspect the content of a tweet by actually clicking on the node. In Table 3 we present the main ideas discussed in each one of the five larger groups in all three networks. From Table 3, it is obvious that in the ENGLISH (filtered) and FYROM net- works, there seem to be a balanced conversation, whilst in the GREEK network almost all views are completely negative. Furthermore, one should note that grouping in communities does not mean that there are barriers. As obviously seen in Figures 1-3, inter-group links do exist, connecting communities that may contain similar or completely different views. It is straightforward for a careful observer to inspect these links and make own judgments, however this turns out to be outside the scope of this paper. As a final process in this Section we now turn on the possible existence of common place between all three networks. In order to do this, we first merge the three networks into one, then partition the nodes according to their original position and check out possible interlinks between partitions. The resulting merged network is shown in Figure 4, where we removed nodes with degree of less than or equal to 2, in order to reduce unnecessary noise.

G1
No main ideas (only isolates)

G2
Actual disputes, conversation between Greeks and citizens of FYROM using English, (i.e. "Macedonia is Greek", "I am a Macedonian and no one will change it" etc.).  From Figure 4, it is obvious that there exists a conversation between the three networks. The ENGLISH (filtered) network has many links to the FYROM network, less to the GREEK network, while the FYROM and GREEK networks seem to interact but in a limited manner. With respect to the English network it seems reasonable that users from both countries either include keyword "Macedonia" or react to international tweets. The most important observation here has to do with the existing links between the GREEK and FYROM networks. However, a closer inspection on the actual tweets does not seem to reveal any particular interest, since they are mostly retweets or mentions over a long list of users that originated from a tweet on the language spoken by Alexander the Great.

Important Vertices
We now turn our interest in a more microscopic level and try to locate actual users who play important roles in the three networks. We calculate betweenness centrality metric and rank users according to their scores. We choose betweenness centrality, since this metric is considered to reflect important nodes in terms of quick passing-by information among all other nodes [3].
It is important to locate such nodes-users in order to examine whether actual news/data are circulated or some indication of "fake news" existence is present. Such an indicator is the actual identity of the node. According to Gorodnichenko, Pham & Talavera [13] in social media one can identify real ("human") users but also social bots, computer algorithms used to produce automated content. Furthermore, special groups of interest may be lobbying around in favor of institutions or companies or even states/countries and try to spread information and a sense of consensus in the society that is favorable for a given candidate/outcome. If the node corresponds to a news agency, a known journalist, a well-respected institution or a politician, etc., then this node is less prone to spread fake news (but post-truth politics is another discussion).
In Table 4, we rank the 10 most prominent nodes according to their betweenness centrality (not including their actual score) within all three different networks. We also include a column representing their actual identity (where possible, N/A means Not Available). We retrieved identities from the users' Twitter accounts.   Table 4 an immediate observation is that in the ENGLISH (filtered) network, among the ten most prominent nodes are three politicians (two from FYROM and one from Bulgaria) and two news agencies. Only one node seems to be a bot, since the user does not any longer exist. Another observation is the lack of any Greek politician or News agency. In this network there seems to be a normal distribution among user categories, so it is rather safe to consider that with the exception of the fourth node, no other node spreads fake news.
One more possible bot is found in the FYROM network (a user with high number of tweets in just a small period of time and then vanished), number 7.
No bot is found in the GREEK network in high rankings. The GREEK and FYROM networks have similar types of users, a politician each, a newspaper and one citizen's movement in the GREEK network and many citizens. It should be noted however that in a number of cases, especially in the FYROM network some users denoted as "citizens" are highly active with respect to their absolute number of tweets, as seen in their account. This might imply that they are not simple citizens bur perhaps political or civil movements.

Some Content Analysis
In this section we use analysis of word adjacencies or word pairs, a well-known technique used to uncover content in texts. According to a number of research-ers (see [14] for a survey) the recurring existence of specific word pairs within a text can be used to create a network of words-a semantic network. In these networks words are the nodes and a link is drawn between two words when they appear sequentially. The semantic network produced has weights on its links as whenever a pair is identified many times in a text then the corresponding link becomes stronger. All relevant SNA processes can then be carried out in such networks, including word rankings, centralities (especially ones that take link weight into account like PageRank) and community detection. Tweets are small texts so the above discussion can be carried out in our case but with special care since tweets are very small texts by default, hence they are "compressed" in a sense. Users try to restrain themselves by omitting words that do not convey messages, however a preprocessing is important in order to remove common words that actually hide out content. NodeXL has the ability to identify word pairs and subsequently create semantic networks, after proper preprocessing, such as the removal of nodes that had no meaning (user names, "rt", etc.).
In Figures 4-6 we present these semantic networks. We draw our nodes with sizes proportional to their betwenness centrality and color them according to the community they belong to. We do not display all words calculated, in order to avoid unnecessary "noise". In our case, for the ENGLISH network we chose to keep word adjacencies occurring more than 30 times, for the GREEK networks more than 10 occurrences and for the FYROM network more than 20 occurrences, thus preserving a loose proportion on the volume of the original networks.
In Figure 5, our previous discussion regarding the actual discussion in this network is confirmed. The main communities of words include topics related to the actual treaty, the fact that this has been a long dispute, some congratulations is given, but also some talks about Great Alexander. This is a moderate discussion, without too much of polarization.
From Figure 6, we see that in the GREEK network case there exists a rather extreme position against the deal. Actually, only a very small proportion of words regard a so called "responsible position", whereas in all other cases the discussion is about "treason" or even "retard voters" or the "Communist party's position" in the 40's. In Figure 6 we have translated all Greek words to English, in order for the network to be internationally readable.
In order to create the FYROM semantic network (Figure 7), we again turned to Google Translate, kept the original word-node names and labeled them with the corresponding translation. In order to avoid possible misunderstanding we kept all word pairs without further noise-reduction. In this semantic network one can observe all possible views on the subject, both supporting and contrary to the deal. We also detect some discussion on the Greek views (some talks about Tsipras and Mitsotakis). We also see discussions on nationality and language. This network seems not to be completely balanced, but nevertheless containing many different opinions. Social Networking Any further discussion on the actual content lies beyond the scope of our paper, since it would involve active politics, personal views and opinions and interfere with other scientific areas (political science, history, diplomacy etc.)

Conclusions and Further Research
In this paper we deal with a topic that has created a lot of controversial discussion in Greece, FYROM and internationally, regarding the state of FYROM, it's new naming and the preliminary treaty signed in Prespes on the 20 th of June, 2018, from the perspective of its impact on Twitter.
We collected a large number of tweets in three different languages, English, Greek and FYROM's language and subsequently created networks of users that reply to, mention or retweet. We then inspected these networks, firstly in a macroscopic manner, calculating their main characteristics, then regarding their clustering ability in different communities and finally on a microscopic manner, regarding actual users. Finally, we created semantic networks, using word pairs (word adjacencies) in all three cases and we tried to extract the main topics of conversation in all three cases.
One general observation over the whole discussion was that the three networks vary with respect to the difference of opinions found both in the community detection and content analysis. Thus, the ENGLISH network contained more balanced views, followed by the FYROM network and ending on the GREEK network which was extremely prejudiced on the negative side. This may convey actual people choices within the states, but may also mean that the use of Twitter as a social network is underused (or maybe used for other purposes) in Greece.
A closer look on the actual users reveals the possible existence of bots (in the ENGLISH and FYROM networks). However, the distribution of different users especially in the ENGLISH case should persuade us that, at least to a large extend, no fake news are circulated. In the GREEK network, too many simple citizens seem to participate in discussions, while in the FYROM case, again citizens are mostly represented, but in many cases these citizens are extremely active in absolute number of tweets, meaning that they might represent either groups or political movements.
One of the actual drawbacks in timeliness research is the fact that things may change rapidly. On the 30 th of September, 2018, a referendum was held in FYROM regarding this very issue of the Prespes Treaty. It is thus expected that there must be a lot of news, opinions, propaganda etc. circulated over Twitter on the time-window before and after this referendum. We already collected these tweets and plan to investigate them in a similar manner as in this paper, adding a type of network comparison in order to find out if the structural properties of networks have changed or even in there is a shift on different views. Still, the situation will continue to provoke discussions, at least until a final official international Treaty is signed and set up to operate.