A Consideration on Emotional Topology : Verbal Data Processing and Representation Applied to Athlete Statements

In order to analyze emotional state and movement for desirable life, we considered a topological analysis of mind for the representative people whom were, in this case, athletes. The utterances were acquired from online articles that included statements they made during interviews. The sampled data was processed with TF-IDF, BLSOM, and fuzzy clustering technique. The resultant mapping provides a typical emotional state and movement in the extracted axes so that one can discuss the athletes’ psychological transition on this map. The proposed procedure was significantly effective at presenting the player’s emotional interest and motivation.


Introduction 1.Background of the Present Study
States of mind demonstrate significant variation, so their diagnoses are an incredible task for researchers.In order to perceive an individual's state of mind, many visualization techniques have been developed in science, engineering, medicine, and other fields.In the present paper, we will discuss a visualization technique with which to represent mental states and motions.Since emotion has no entity, the representation technique requires technical knowledge to analogize relationships between an event and emotion.In extant literature, we are able

Proposal of the Present Study
The present study was conducted with quantitative analysis by using a text mining on the utterances of objectives in an interview-based article that appeared on the internet.Based on the utterances of the targets in the interview, we tried to quantify the psychological state of each person.In the present study, an interesting technique has been proposed to visualize the relevance and time sequence of multidimensional data, via transformation into two dimensions by combining self-organizing maps (SOM; Kohonen, 1995;Kanaya et al., 2001).Such an approach is a nonlinear analysis method with unsupervised learning and fuzzy cluster analysis, which is effective as a processing tool to quantify an ambiguity that is inherently included in natural language processing (Shinkai, 2008).

Previous Studies 1.3.1. Text Mining with Verbal Analysis
First, let us describe text mining with verbal analysis.This technique is well known as a method of applying quantitative analysis on text data (e.g., a speech, description, newspaper article, novel sentences, and other types of communication).The vector space method (Salton et al., 1975) is known as an attractive technique for searching and classifying the documents.By using this method, the individual documents are characterized with vectors (the frequency at which keywords appear as specific phrases included in the document, which are referred to as document vectors in this study), and retrieval and classification is possible.Now, since the number of keywords corresponds to the dimension of the vector (i.e., the number of components), the document vectors usually constitute multidimensional vector data.In order to classify and cluster such multidimensional data, a suitable technique has been used to reduce the dimension, such as latent semantic indexing (LSI; Deerwester et al., 1990), principal component analysis, factor analysis, and so on.These analyses make it possible to reduce a high-dimensional document vector to a low-dimensional space so that we can obtain a representation with two-or three-dimensional charts to provide as an overview of our understanding.Such processing is very effective and fruitful, whereas an essential information loss is inevitably involved.In addition, since the document vectors that are quantified by text mining in interviewing, etc., are fundamentally qualitative data, we cannot expect a linear correlation and should suppose a more complicated correlation structure.
For this reason, we propose an effective combination of SOM and fuzzy cluster analysis.SOM is a nonlinear analysis method that makes it possible to visualize the classified result on a two-dimensional plane.An important advantage of SOM is that the data can be analyzed while maintaining the topological order of data in a multidimensional space.In addition, by classifying the data on the two-dimensional map that is obtained as the output result, there is a possibility of interpreting the psychological state indicated for each individual and extracting a primary emotional axis for psychological evaluation.DOI: 10.4236/psych.2018.94054

Evaluation of Axes of Emotion
The previous studies (Wundt, 1924;Russell, 1980) revealed that emotions, as a transient psychological state, are divided into two-or three-dimensional axes (i.e., "pleasure-displeasure" and "activation-deactivation" or "tension-relaxation" and "excitement-depression").Sakairi et al. (2013) reported that the psychological scale based on these evaluation axes, as shown in Figure 1, was a useful way to quantitatively describe the psychological state of the subject and its time sequential change.As shown in previous research, a psychological metric for measuring emotions is introduced and the measurement results using the scale are reduced by factor analysis.Following this analysis, two-or three-dimensional evaluation axes have been extracted as an important general factor in emotion.
Then, we can use them as an index to capture the difference of emotions occurring between different groups and the time sequential change in emotion.
In the present study, psychological features indicated by verbal data are quantified and analyzed with multidimensional document vectors.Now, a novel processing procedure has been proposed, in which the psychological state evaluation axis is not only restricted in the above two-or three-dimensional space, but it also represents the emotional base axes obtained by SOM.Consequently, the evaluated axis for psychological state will be determined from the distribution and transition of output data, which is deduced with SOM analysis.Therefore, we can expect an effective and efficient communication in exchanging the information about the psychological state or transient change of the subject between the investigators.

Psychological Evaluation of Athletes
As the first trial, we adopt this process to the utterances delivered by the athletes For athletes to face mental training, it is necessary to understand the skills they possess or are missing, and also to understand the psychological state according the circumstances (e.g., by conducting an interview or questionnaire assessment).
We developed an evaluation index of athletes' psychological-competitive ability by using a questionnaire and confirmed the effectiveness of that approach (Aoki et al., 2017).An analysis of athletes' utterances in interviews is expected to clarify their psychological characteristics in more detail.Thus, there is significant meaning in the analysis on athletes' utterances in the interview so that we visualize their psychological characteristics and share common recognition about them with the athletes themselves, coaches, managers, and so on.
In an attempt to visualize the psychological state of the sports group, we can apply the findings from the study by Takemura et al. (2014), in which they performed SOM clustering based on the mental and physical features of each player on a rugby team.In the study, the relationship between physical and psychological features and personal attributes (e.g., positions, regulars, and substitutes) has been analyzed based on the result of clustering.However, it cannot cover an evaluation method relating to an individual psychological state that is based on certain criteria and method to visualize the transition.As an attempt to note a time sequential variation of the psychological state of the group and individual, Kobayashi et al. (2016) distributed a psychological scale questionnaire among a basketball team and visualized the result of each player in the two-dimensional coordinate plane.Kobayashi et al.'s (2016) study described the effective visualization by using a two-dimensional plane in which the psychological changes of both groups and individuals can be visually grasped.In the present study, the use of SOM makes it possible for us to make such changes visible from the aspect that set the transition of phase.Consequently, we can realize an effective information transmission or communication that grasps the state and transition in the psychological phase space.

Concept of Processing Protocol
To analyze and visualize the verbal data, we proposed the following procedure as shown in Figure 2. The objects of analysis in the current inspection are athlete statements that are presented online.We will not aim to collect the verbal data directly from an athlete because we would like to treat naturally stated verbal data that they were not conscious of inspection.In the second step, the modal DOI: 10.4236/psych.2018.94054words will be selected from the collected words.The modal words will significantly reflect the psychological state of the subject, so the researcher should deliberately choose them.In the third step, the frequency of modal words (TF: term frequency) is counted, then the inverse document frequency (IDF) is calculated from the data.The significance of TF and IDF in the present processing is slightly different from the previous studies.The reason will be explained at a later point.
The next step has two processes, SOM and fuzzy cluster analyses.The input vectors for both types of processes are TF-IDF data.In fuzzy cluster analysis, assessment is focused on the representative vector in the SOM output layer, then the clustering of neurons, which constitutes the output layer, is performed with the representative vectors.Consequently, we can get the coordinates that are classified by using individual utterances and the area division on the output layer.In the last step, we will discuss the topological characteristics based on the coordinates, thus the features of clients (athletes) are obtained via psychological diagnostics.mately the source of data is constructed with utterances, , 1, , , where 736 NU = , from interview articles.

Selection of Modal Words
As a preliminary language processing tool, morphological analysis by TTM (Matsumura & Miura, 2009) was adopted on the source of data.In such analysis, the original source was fragmented into morphemes (i.e., some minimum units, such as a noun, verb, adjective, adverb, conjunction, and so on); that is, decomposing the utterances into the morphemes.Through this processing, 4418 sorts of morphemes were extracted from U. After the decomposition, 20 top-ranked words ( ) , where 20 NM = , were selected as modal words.Appearance frequency data of each modal word

Application of TF-DFs
We newly defined the TF-DF algorithm for quantitative evaluation.In this method, term frequency (TF, Luhn, 1957) and document frequency (DF) instead of inverse document frequency (IDF, Spärck Jones, 1972) are adopted and they imply the probability of a word appearance and a weight coefficient of the word over the documents, respectively.Thus, TF-DF involves a metric of frequency with weight for the word.The arithmetic equations are as follows: Here, , k j k f ∑  represents a total number of modal words in each utterance j U , and , The TF-IDF method is considered to be effective at distinguishing various documents by emphasizing on differences between them.In contrast, the TF-DF method can cluster documents by weighting common words and increasing similarity between them.Because of that, we used the TF-DF method for weighting the appearance frequency data in consideration of fuzzy cluster analysis in the present study.

Self-Organizing Maps
SOM, a neural network algorithm proposed by Kohonen (1995), is an effective technique for nonlinear mapping that converts multidimensional data into a low-dimensional space.This mapping consists of a two-layered structure as an input layer and an output layer (competitive layer).Usually, the output layer is two-dimensional, such as a grid or lattice.Each lattice is called a neuron, and DOI: 10.4236/psych.2018.94054882 Psychology K. Aoki et al.
vector data (representative vector) with a certain dimension is associated with the input data or vector, which is also allocated on each neuron.
In a usual SOM algorithm, each input vector is compared with all representative vectors, and a neuron (lattice) of which a representative vector has the smallest norm from the input vector is assigned as the winner neuron.As a result, the input vectors are classified into the output layer so that the input vectors with similar features are mapped on the neuron.Through some iteration, the representative vector is updated in such a manner as to approach the input vector.Such sequential self-learning-type algorithm causes the closer correlation between the representative vector and the input vector.Now, we are able to find a variety of algorithms in SOM.Some algorithms are developed to make a cluster and extract the features in a good manner; however, some of them depend on the initial input vector distribution.Such dependency is not preferable for our task, so we adopted the batch-learning type of SOM (Batch Learning SOM: BLSOM).We used a BLSOM analysis program "BLSOMviewer", which was created by Kanaya et al. (2001).The details of this analysis are described below.

1) Setting the initial value of representative vector
In order to assign the initial value of a representative vector into each neuron of the output layer, a principal component analysis was carried out with the in- ), which is defined as follows.
( ) , , , Here, we used a reduced expression H for 20 of the i-components of h.
Eigenvectors 1 a and 2 a , the first and second principal components, and standard deviations 1 σ and 2 σ of the first and second principal component were obtained from this analysis.Using these values, the initial values of the representative vector, ( ) , where 6, 5 L M = = , is calculated with the next equation (Kanaya et al., 2001).
where AV H is the average of the input vector over 1, , j NU =  .x     and x     are ceiling functions to give the smallest integer greater than or equal to x, and floor function to give the largest integer less than or equal to x, respectively.

2) Redistribution of input vectors
Each input vector is allocated to a suitable neuron, which is the smallest norm between the representative and input vectors in a 20-dimensional Euclidean space (i.e., which used to be called the winner neuron).In such a way, the input vector is redistributed in the map, then the representative vector is updated with input vectors.The sequence is as follows: For j-th input vector, j H , suppose that the representative vector, , l m W , has the smallest Euclidean distance (i.e., norm in the modal words space, here a 20-dimension space).Thus, all of the vector elements Here, r is the iteration number, which is incremented at each step of redistribution of input vectors, and the representative vectors are updated.

3) Update of representative vector
In order to assign the initial value of the representative vector into each neuron of the output layer, a principal component analysis has been carried out with the input vectors j H ( 1, , ), which is defined as follows.
After classification of all of the input vectors, the representative vector will be updated according to the following equation. ( ( ) ( ) r α also decreases as the iteration of learning increases, thus the representative vector gradually converges to the optimum value.

Fuzzy Cluster Analysis
Here we treat the representative vector as the representative matrix W , fuzzy cluster analysis was conducted.In normal cluster analysis, it is assumed that an object belongs to one cluster (hard clustering), whereas fuzzy cluster analysis assumes that an object ambiguously belongs to any specific cluster.By using this concept, we can find the common properties among the objects that have multidimensional components.First of all, we shall calculate the correlation between all pairs of elements of , l m W , which is described by a value from 0 to 1.For the representative ma- trices ), in calculating the cross-correlation (i.e., the similarity) we define the membership matrix F as the following equation.
DOI: 10.4236/psych.2018.94054 where T L M = ⋅ in Equation ( 13).As shown here, the matrix F has correlation values as components.In the next step, F is transferred to a reachability matrix, F  , which is determined by the iteration of the following calculation.
Here, function-Max indicates the maximum value among each element product, which corresponds to the logical sum of regular reachability matrix calculus.
After T L M = ⋅ times iteration, with a suitable threshold, R for F  , the clustering can be well processed, so we obtain several patterns in SOM map field.

Visualization of Wl,m and Hj
As the result with BLSOM analysis, the representative matrix , l m W was acquired from the input vector j H after 100 times iteration of learning.In addition, fuzzy cluster analysis provides some clusters according to the similarity of representative matrix , l m W .By using these processing methods, we can propose a pattern tracking technique in order to visualize a time sequential transition of the psychological state for each athlete or client on a two-dimensional mapping approach.By considering the clustered mapping and tracing the trajectory of a player's state on it, we are able to find the emotional state and movement or fluctuation for each athlete.The axis of emotional state will be considered psychologically with clustering data, and its movement can also be considered to capture the athlete's emotional transition.

Calculation of Input Vector Hj
In this paper, we collected a total of 60 articles of NC = 60 players (one article per player) through web contents.Each article includes 5 -20 utterances, so we obtained NU = 736 utterances from 60 player's articles.The set of all utterances U was decomposed by TTM, so 4418 sorts of morphemes were generated.Through the frequency analysis, NM = 20 modal words, i M were selected as modal words according to the order appearance probability, i e NW .BLSOM and fuzzy cluster analysis was conducted, so the following results, which are described in the next section, were obtained.

Clustering of Hj
Since the output of the analysis is that 1 1.085 σ = and 2 1.080 σ = , we obtained that . Therefore, the present BLSOM analysis yielded a 6 × 5 neuron matrix; that is, the input vector ) was mapped into 30 neurons.Fuzzy cluster analysis was carried out for , l m W , so we were able to categorize it using a similarity.Now, we shall show some results if the threshold was changed to obtain a suitable clustering.After the processing of a reachability matrix, we obtained the following map.Figure 3(a) shows the color contour map of F  .The reachability matrix asserts the connected networks among the elements, so some clusters are relieved from the contour map.In the next step, we will divide it into some clusters with the threshold of R. For example, we can obtain 4 clusters with R = 0.67.W does not include any modal word in it.2,1 W includes almost zero components, but the modal word "mistake" has a somewhat large value.Also, diagonal clustering proceeds to the right and downward, principally.
According to the classification, one can follow a major cluster, and 4 small clusters in 5 clusters with R = 0.68.The major cluster will be subdivided into smaller clusters in the progression.Now, we should determine a suitable set of clusters from these classifications.
For adequate diagnosis, we selected a 10-cluster categorization with R = 0.79, in which we can see uniformly scattered clusters more than the others.In Figure 4,      W is the origin of the mapping, so most of the data is included in this neuron, where the occurrence of the modal words is very poor during speech (i.e., the utterances do not indicate any detectable features).

Psychological Interpretation of Wl,m
Before discussing the characteristics of 10 clusters in Figure 5 and Figure 6, we present a histogram of the vector components of , l m W in each cluster.The or- dinate is a relative value to the averaged , i j H in each cluster for each modal word.We can find the predominant vectors in the respective clusters because of BLSOM analysis.These features of clusters will be discussed in more detail in the next section.
The map is extracted again, as shown in Figure 7, with some additional features since a major modal word in each cluster has been tagged.Referring to these words, at a glance we can conceive positive and negative expression clusters; that is, CLs 6 and 7 are dominant with negative words and are located at the upper region, whereas the others are positive.CLs 8-10 present especially positive and rewarding emotional terms.In order to psychologically interpret this mapping, let us consider the kinematic similarity in emotion (i.e., the state of mind).The kinetics is used to expressing coordinates' velocity and acceleration.DOI: 10.4236/psych.2018.94054In addition to this principal, our emotion should be described dynamically with a state and motion.Namely, the state will be denoted in scalar and the motion in vector.Here, we will try to express the emotional transition with coordinates and vectors in the SOM table.
According to BLSOM analysis, we introduced some major axes in the map.
Here, the solid arrows indicate scalar coordinates and dashed arrows portray vector coordinates.As shown in this mapping, we can find "displeasure" and "pleasure" states in the upper row and right column, and "active," "tense," and "ebullient" states in the bottom row.Furthermore, considering the transition between those state, "deactivation and activation," "fighting spirit," and "spontaneity" vectors are automatically discovered.The definition is not unique, of course, so one can find another interpretation instead of the present expression.
Now, we will propose an effective expression, as shown in Figure 7, to understand the emotional state and transition for the athletes.As the template for the recognition of the present athlete's emotional behavior, we will discuss the player's psychological emotion in the final section.

Utterance Transition for Exemplary Players
In this section, we will discuss the emotional behavior according to the utterance transition for some players.Figures 8(a)-(c) show the sample mapping of exemplary players.In this analysis, we will not mention the properties of a player, such as his/her sports events, category, and so on.characterized in the refrain of modal word, such as "win" or "goal," in utterances.
The activation vector indicated by player #33 is closely related to his/her extrinsic motivation, which is proposed by Deci & Ryan (1985); that is, these words do not imply an emotional interest or fun about sport itself, but an irrelevant benefit as a substantial reward.Player #33 also indicates an almost constant transition on the map, thus a consistent attitude toward his/her competition, which may be encouraged by the extrinsic motivation.

Figure 1 .
Figure 1.Core affect and two-dimensional mood scale.

Figure 2 .
Figure 2. Processing flow for emotional topology representation.
fulfills the following equations.The variable ( ) r β is the following learning parameter indicating the neighbor region of the winning neuron.It decreases as the iteration of learning increases.

(
learning rate coefficient, so it is a parameter for determining the convergence in updating the representative vector , l m W . T is the iteration of learning.

W
expresses a characteristic pattern about the emo- tional distribution.To extract the features of , l m Figure 3(b) shows the original correlation map of binary processing with R = 0.67; here we can visually recognize 5 clusters.According to the above processing, we can obtain various clusters by changing and considering the similarity R.
Figure 4 shows several examples illustrating the clustering of j H .In this figure, six patterns are shown according to six dif- ferent similarities.The field was clustered as l in primary eigenvalue direction, then m in secondary eigenvalue direction.Here, the neuron of ( ) ( )

Figure 3
Figure 3. (a) Correlation distribution of F and (b) clustering structure extracted in F  .

Figure 8
Figure 8(a) shows the emotional transition of player #33.The circled number indicates the serial number of his/her utterance in the articles.Following the trajectory, one can see that the psychological change along the activation/deactivation axis dominates his/her subjects.The appropriate change is

Figure 8 NomenclatureF
Figure 8(b) shows a transition within player #47.The trajectory shows a departure within the displeasure zone.Seven out of twelve input vectors of this player are classified into CLs 6 and 7 on the output layer, which means that