A Note on Densities of the L2 Mental Lexicon Network *

In this note, we discuss a supplementary explanation about works of Takahashi and Tanaka, which are a kind of replication study of Wilks & Meara (2002), where they discuss the structure of learners’ mental lexicon network from the viewpoint of psycholinguistics. Here we mainly discuss some mathematical structures which support their studies latently. Using some simple random graph techniques, we propose that the role of “densities” is important and useful under some suitable assumptions. Then, we exhibit the effective-ness of our method by applying it to the data of experiments in Takahashi and Tanaka, where some inequalities obtained mathematically help us to investigate some of the aspects of the L2 mental lexicon network. Mathematical proofs and detailed discussion are collected in the final section, in which we give also mathematical interpretation on some models which were shown by computer simulations in previous works.


Introduction
The basic object to examine "mental lexicon" is to investigate how humans organize huge "words" in their mind. Many researchers in various areas are attracted to the structure of mental lexicon and try to illustrate it. Many types of findings are found in the standard textbooks, for example, (Aitchison, 1987, a network is so-called a graph. Here the set of vertices (in the terms of Graph Theory) corresponds to that of words and the set of edges does to that of "associations", where association implies the existence of strongly link of two words.
The density of a given graph is defined as the ratio of the number of all the edges in the number of all possible edges which equals to ( ) 1 2 n n − , where n is the number of all the words. Wilks & Meara (2002) investigated a difference between the densities of L1 and L2 in French. Their experiment was designed as 40 questions (items) in one questionnaire and each question is: for 5 words chosen randomly from the first 1000 most frequency words, 1) if any pair of 5 words are thought to be no association, nothing should be written; 2) if one or more pairs of 5 words are thought to be associated, the only one pair of strongest association should be circled.
We say the "hit rate" for the ratio of the number of answers to be associated over that of all questions.
Through this "five-word task" experiment, we should remark that we cannot obtain the structure of associations of the individual mental lexicon network since, under the instruction (2), we can only see the strongest link pair of two words; so to say, we asked the participant to break the "detail structure" of associations. However, it is important to see the ratio of items with no-association.
Once the network of mental lexicon was constructed, the density p behaves like the existence probability of association between any pair of words in this network.
Under the assumption that such a probability p is identically and independent distributed, the probability of an item with no-association occurring is estimated as ( ) 10 1 p − since the number of all possible pairs is 10. Therefore, by observing the ratio of items with no-association, we can get the density p. We are very careful to treat items with association, which was forced to be indicated one-pair only. Of course, we know such instruction (2) may be suitable for actual experiment with psychological reasons for participants. If such a experiment was done for subjects not having human emotions, for example, in computer simulations, the restriction in (2) would not be needed. Namely, instead of (2), if the condition that all pairs of association might be indicated was asked, it would be somewhat difficult for participants to choose "no-association" in the item just after indicating "many associations".
Repeatedly we should say it is important to obtain the number of items with no-association as accurately as possible.
( ) , G n p denotes a random "undirected" graph such that every possible edge occurs independently with probability 0 1 p < < for n vertices. See also Figure 1.
In Wilks and Meara (Wilks & Meara, 2002), they derived the density by many computer simulations to generate regular random graphs and observe the existence of edges in randomly selected subgraph of 5 vertices. However, we emphasize the relationship between a hit rate and its density can be obtained analytically, whose concrete form will be seen in Proposition 2.1 in Section 2.2.2. Namely, once we obtained the hit rate, we can derive its density immediately. In this sense, we calculated the densities in the paper of Wilks and Meara, then their evaluations were about half of our ones. We imagine there was a little bit of mistake or ambiguity in generating random "regular" graphs in their paper. They introduced all notions in graph theory in terms of "undirected" graph: for instance, symmetric adjacency matrices representing the structure of undirected graphs, the density represented by the ratio of the number of all undirected edges in the complete graph and so on. So it is natural to recognize their results in the setting "undirected" graphs. In fact, we re-examined their strategy in using a statistical programming language R (R Core Team, 2019) and a graph theoretical package iGraph (Csardi & Nepusz, 2006), then our estimation seems to be right. However the essence and importance of their paper cannot suffer from tiny errors; Open Journal of Modern Linguistics as will be seen in Remark 2.4, if we apply "directed" random graphs to the association task for 5 words chosen randomly in (Wilks & Meara, 2002), we can find the estimations in our way coincide with the results in their simulation.

Basic Relations around Densities
As stated in the above, we first give a simple relation between the hit rate in the five-word task and the density.
Proposition 2.1. We can easily find, for the five-word task, ( ) 10 hit rate 1 1 density " . " " Thus, once the hit rate is given from actual experiments, we soon obtain the density as ( ) Here the exponent "10" of the right hand side of (1) comes from 5 2       in "five-word task," which is the number of all possible pairs on one question consisting of 5 words chosen randomly. Thus, if we use a questionnaire such that each question consists of " n -words" ( n ∈ N ) chosen randomly, that is, " n -word task," each exponent of the right hand sides of (1) and (2) is replaced with 2 Before stating the next relation, we give a definition in terms of graph theory.
The set of all words correspond to the set of vertices ( ) V G of a given graph G.
Similarly the set of all associations among words corresponds to the set of edges ( ) E G and the density does to the ratio of the number of edges of G in that of the complete graph with size of ( ) where U n is the number of elements of the set U, U is the subgraph of G induced by the set of vertices U and ( ) , E A B is the set of edges connecting two set of vertices A and B.
We find the following relation is trivial from Definition 2.2 but important in this paper. It may be said to be a kind of mean field theoretic approximation; the left hand side of the following equation implies a "one-body system" A + B and the right hand a "two-body system" A and B.  In actual experiments, we may say (4) becomes a kind of assumption for derived/underived "small" densities satisfying in the sense of mean field theoretic approximation. In particular, we discuss in Section 3 the relationship among densities in our "five-word task" experiments under the relation (4) for simplicity. For more details about the relation (4) in the "k-word task" experiments, refer to Section A3 in Appendix.
Remark 2.4. Throughout this paper, we treat "undirected" random graphs only. As stated in the previous subsection, "undirected" random graphs seemed to be treated in (Wilks & Meara, 2002). However, by seeing the differences between our estimations and their results and referring to the sequential works of Meara et al. (Wilks, Meara, & Wolter, 2005;Meara, 2007), we confirm they treated "directed" graphs, or, "digraphs" in their simulations. Now let us show the relationship between a hit rate and a density in the case of "digraphs" in the similar way to that in the above. Let is the set of all directed edges, or, arcs x y is an arc from x to y. In this setting, it is natural that x and y in ( ) We set the den- Under the assumption that such a probability p is identically and independently distributed, the probability of no-association occurring between x and y is estimated as ( ) 2 1 p − . Moreover we can easily find, for a randomly selected n-words experiment, that is, "n-word task," hit rate 1 1 " " .
Thus, once the hit rate is given from actual experiments, we soon obtain the density p as When 5 n = , we can see the formula (6) almost recovers the results of simulations in Table 1 and Figure 3 in (Wilks & Meara, 2002). In their sequential works (Wilks, Meara, & Wolter, 2005;Meara, 2007), they treat digraphs without ambiguity and introduce a modified model considering associations linked "indirectly" in some way. We give mathematical interpretations on some of their models and results in simulations in Section A6 in Appendix. Repeatedly we remark we treat "undirected" random graphs only in this paper.

The Study of Takahashi and Tanaka
The studies in (Takahashi & Tanaka, in preparation;Tanaka & Takahashi, 2019) report on a word association task for Japanese learners who learn English as an L2. The English words used in our word association task were chosen from The New JACET List of 8000 Basic Words (JACET, 2016), which is a list of essential English vocabulary for university students in Japan. In total, we chose 1090 words essential for junior high school students and 1744 words essential for senior high school students; hereinafter the former set of words is called "Group A", the latter "Group B", and the whole set of words "Group A + B." The format of our word association task was similar to the one performed in Wilks and Meara (Wilks & Meara, 2002). However, we performed it using English words instead of French words. In the experiment, we presented a set of five words in each trial. See also Figure 2. There were 80 word sets in a word association test as follows: in the word set 1 -40, we selected 5 words randomly from Group A + B; in the word set 41 -80, we chose 5 words only from Group B; the two types of words sets appeared in a random order in each questionnaire. Participants were asked to identify a single association, if any, in a given set of randomly chosen five words in the word association tests that we devised. Then we measured the "hit rate", the ratio of word associations.
An example and the flow of our actual experiment can be found in Section A1.
In (Takahashi & Tanaka, in preparation), the participants were five Japanese undergraduate students whose proficiency level of English was intermediate (their TOEIC Listening and Reading score was approximately 600), and five Japanese researchers who used English on daily basis in their research. The data of the students showed that the mean hit rate was higher when the words from Group A were included, suggesting that the words in Group A might be functioning as hub vertices that help link the words in the lexical network in their L2 mental lexicon. See also Figure 3. This tendency was not observed in the test results of the researchers. See also Table 1.
In Tanaka & Takahashi (2019), the participants were thirty-two native Table 1. Result in (Takahashi & Tanaka, in preparation    For interpretation and analysis in terms of (psycho)linguistics for these results, refer to (Takahashi & Tanaka, in preparation;Tanaka & Takahashi, 2019).
We should remark that the young participants selected for our experiments in (Takahashi & Tanaka, in preparation;Tanaka & Takahashi, 2019) are intermediate-level learners of English in a Japanese university. We believe participants must not be too high-grade nor low-grade in investigating the standard structure of the L2 mental lexicon for the majority. In this sense, our participants might be quite suitable.

Taste of Scale-Free Network
The concept of complex networks, or, scale-free network has relied on the construction of network evolution models. Most of studies on complex network, which are based on (Barabási, 2009;Barabási & Albert, 1999), discuss the topological statistics of the evolving network.
In our studies, we cannot evaluate such properties since the number of vertices, or equivalently, words, is very large but fixed. However we imagine the mental lexicon network of L2 learner may evolve like a complex network evolution model as they acquire new words more and more. In this sense, we can say that we could see some snapshot in their evolution and some trend as complex network. Thus, through the actual experiment, we would find the existence of "hubs", which correspond to "important words" in the process of acquiring words.
Let us assume that Group A is the set of "more important" words in the whole set of words Group A + B. If this assumption is true, we can find the significant differences between the hit rates of Group A + B and Group B; moreover the density involving Group A is denser than that involving Group B. In other words, the significant differences can be said to be in favour of complex network in the process of L2 learners' mental lexicon evolution model. However, we cannot state that the mental lexicon forms the scale-free network due to lack of showing the scale-free properties (Barabási, 2009;Barabási & Albert, 1999;Schur, 2007;Vitevitch, Goldstein, Siew, & Castro, 2014), for example, a power law distribution of degree of vertices. At this stage, all we can say is that the L2 mental lexicon network has a possibility to be a kind of scale-free network. Here let us recall our main purpose is to show that the whole set of words can be divided into more and less important groups and to propose some modelling for our experiment.

Latent Mathematical Ideas
As is seen in Section 3.1, the data of the students shows that the mean hit rate in Group B is lower than that in the whole set of words Group A + B. Thus the density of word associations in Group A is expected to be higher and affect that of Group A + B. However, by this result only, it is not sufficient to state that "the set of basic words", Group A, plays an important role in the whole set and act like "hubs".
A natural question which may arise is why they did two types of experiments, Group B only and Group A + B, and did not another type, Group A only. There exist two types of answers: one is that the total number of questions becomes over 100 and seems to be too much to keep the qualities in participants' answering (Dörnyei & Taguchi, 2010;Gillham, 2008). Another is that, by using only the data we obtained so far, we can derive some qualitative expressions exhibiting words on Group A are more important. These expressions can be seen in (Takahashi & Tanaka, in  Definition 3.1. We define two probabilities A p * and B p * as follows: Remark 3.2. As are seen in (Takahashi & Tanaka, in preparation;Tanaka & Takahashi, 2019) or in Table 1 and  (4) in Proposition 2.3, which is, so to say, the general assumption, we can fortunate- and that A B p + and BB p are given.
R R n n p p n n n n n n p n n p n n n n In particular, if  , we should remark that the restriction for AA p becomes weaker for the conclusions as B n is bigger than any fixed A n in both Propositions 3.3 and 3.5. Furthermore, the next corollary shows the condition in Proposition 3.5 is stronger than that in Proposition 3.3, that is, Let us apply these estimates to results in experiments in (Takahashi & Tanaka, in preparation;Tanaka & Takahashi, 2019). Here we use a statistical programming language R (R Core Team, 2019) for numerical calculations throughout this paper.

In Preliminary Study of Takahashi and Tanaka
In the preliminary study of Takahashi and Tanaka (Takahashi & Tanaka, in preparation which satisfy (12) in Proposition 3.4.
We recall a kind of exact value of AA p or AB p cannot be obtained. However, by using (34), (36) in Appendix and the information on "the number of each type in answered pairs" in Table 1, we can estimate AA p and AB p below: Claim 3.7. We have where AA, AB and BB are the numbers of answered pairs in the experiment connecting both two words in Group A, between one word in Group A and the other in Group B, and both two words in Group B, respectively.
Depending on the requirement (2) in Subsection 2.1 for experiments, "hidden" associations, which may exist, cannot be counted. Thus we only have such inequalities stated in the above.

In Further Study of Tanaka and Takahashi
In a further study of Tanaka and Takahashi (Tanaka & Takahashi, 2019), for 32-students, there was also a significant difference between mean hit rate in  (11) and (15) which does not satisfy (12)

Modelling towards Actual Experiments
where , AA BB p p and AB p implies the density (probability) connecting two words in Group A, Group B and between Group A and B, respectively. Open Journal of Modern Linguistics Let us give a brief explanation of Assumption 3.8 stated above. The relation (28) corresponds to the Lorentz-Berthelot rules or the Berthelot rule, which is well known as one of basic combining rules in computational chemistry and molecular dynamics. In our context, it is natural to consider every word in Group A and B corresponds to a particle of two categories of identical particles A and B, respectively. Therefore three densities , AA BB p p and AB p can be also considered as the interaction force within A, B and between A and B, respectively. In this sense, it is natural to give the relation (28) (28) and (4). The explicit expressions of AA p and BB p in terms of A n , B n , A B p + and BB p , which are somewhat complicated forms, can be seen in Subsection A5 in Appendix. Calculated results are shown concretely in Table 3 and Table 4 in Subsections 3.4.1 and 3.4.2, respectively.
As stated before Claim 3.7, we should take care in dealing associated pairs in the participants' answers. The requirement asked for all participants, which is suitable in the actual experiments discussed in Subsection 2.1, is that the participants must indicate at most one-pair with association per question. By virtue of this requirement, when two words are chosen as one pair in some item, we cannot judge whether it may be only one pair in this item, or it may be stronger one pair than other one or more pairs, or it may be one pair in a triangle or more complicated subgraph. In other words, we cannot know how strong the answered pair is, and whether other hidden pairs are within Group A, Group B, or, between Group A and Group B. Namely, this requirement might break the structure of associations which may exist potentially in answering. For these Table 3. Results summary for (Takahashi & Tanaka, in preparation  about 18-associations, are considered to be hidden due to the requirement in one person's answering. On the other hand, the impression for almost all participants is that almost all answered association were unique and there were three questions at most having two or three associations (Takahashi & Tanaka, in preparation;Tanaka & Takahashi, 2019). The interpretation about this difference between the measured quantities and their impressions is a further problem for (psycho)linguistics.
Although there exist some problems like stated above, it is plausible to consider the ratio among three types of answered associations, AA:AB:BB, as a "criterion" for the ratio among three types of associations which might be appeared in minds. Moreover, each expected number of pairs per question derived from our experiments (cf.

For Preliminary Study of Takahashi and Tanaka
From experiments (cf.  Table 3. On the other hand, we can see no significant difference of the ratio AA:AB:BB of student for the ratio among , AA AB Q Q and BB Q (with their densities) and that of adults for the ratio , AA AB P P and BB P (without their densities). Therefore we may say the density of adults would be uniform in all the words A and B and our densities derived in our modelling are fit on the data for students in the sense that there exists no significant difference from the ratio among AA, AB and BB.

For Further Study of Tanaka and Takahashi
From experiments (cf.  Table 4. Then, combining with the discussion in the previous subsection 3.4.1, we may say our densities for students in our modelling are fit on the data in the sense there exists no significant difference from the ratio among AA, AB and BB. Consequently, we presume the Lorentz-Berthelot rules in (28) can be applied to the densities in the mental lexicon network of L2 "young" learners, which is considered to be in the process of evolution; for L2 "matured" learners, their mental lexicon network are considered to be almost completed. For details, these topics should be discussed in the context of linguistics or psychology.

Concluding Remarks
The main topics discussed in (Takahashi & Tanaka, in preparation;Tanaka & Takahashi, 2019) are explorations of the structure of the mental lexicon of L2 learners' in the context of (psycho)linguistics. Though a concept of random graph is introduced and applied essentially, only rough ideas derived mathematically are provided and details are omitted. Thus one of main topics in this paper is to give the explicit formulae and their proofs, which are seen in Section 3.3 and in Section A4, respectively. In addition, we propose a kind of modelling for actual experiments in (Takahashi & Tanaka, in preparation;Tanaka & Takahashi, 2019). We know that there exist many types of restriction in doing actual experiments although we always want to get sufficiently many kinds of data. In the sense that much information is derived by the actually obtained data so far, our proposal may be useful. For example, the "hit rate" in the n-word task brings us the "density" via a generalized form of Proposition 2.1 and Remark 2.4.
Here we discussed almost all topics in terms of mathematics, but many results should be interpreted and discussed in the context of (psycho)linguistics (cf. Takahashi & Tanaka, in preparation;Tanaka & Takahashi, 2019), that is, we believe many fruitful and further studies are brought. Figure A1.

1) INPUT: Participant looks at 5 words per one question. See also
2) Example of possibilities in a participant's mind: total number of possibilities is 1024(=2 10 ). See also Figure A2.
3) Required condition: Choose the at most one association (or two words) with the strongest link, even if two or more associations exist. 4) OUTPUT: Answer one association of two words or no association. See also Figure A3. 5) Go to the next question, that is, return to 1) in the next one.

A2. Counting and Probability
Let A n and B n be the numbers of words in Group A and Group B, respectively. The event that both two associated words belong to Group A in a given word set consisting of five words selected randomly from Group A + B is denoted by AA E . Let us show the probability ( ) AA P E can be given as Let X be the random variable such that the number of words belong to Group A in a word set, that is, randomly selected five words. Then we have the probability ( ) P X k = as slice onto maybe test mental Figure A1. Example of a questionnaire in Step (1).
Here we remark the following relation: for positive integers n m ≥ ≥ , Thus we can express the numerator and denominator in (32) where we use the total mass for the hypergeometric distribution is equal to 1: n n n n P E P E P E n n n n n n n n n n n n In experiments in Takahashi

A3. Mean Field Theoretic Approximation
If we consider the mental lexicon network model in which a unique probability of association occurring p, then the relations (1) and (2) hold and its "density" coincides with p via the "hit rate" derived from word association tasks. However, in the setting where two or more kinds of densities, the hit rate from "k-word task" ( 2 k ≥ ) is estimated in another expression. Let us show it in our model exhibited in Section 2.2.2. The "hit rate" can be expressed explicitly as follows: where we set 0 2 s   =     for 2 s < . In the above, the probability A B p + from (2) may be said to be the averaged "probability" in the whole set Group A + B, that is, two bodies system of Group A and Group B with AA p , AB p and BB p is translated into one body system Group A + B with A B p + . We should remark that A B p + , which is expressed by AA p , AB p and BB p , cannot coincide with the ratio of number of edges in G as is shown in (3), in general, and that the relation (39) coincides with the general assumption (4) when 2 k = . Referring to (35) in Section A2, we obtain where ( ) ( ) ( ) ( ) implies the probability of any association not occurring in "k words" selected randomly, which consist of and the right hand side of (40) is where ( ) o ⋅ is the asymptotic little-o notation. Referring to (35) Moreover, referring to (33) in Section A2, we can see that the left hand side of (43) is n n n n p k where we set 0 s which implies the exact relation (40) in the " -word task" for 3 k ≥ can be well approximated by the relation (4) if all densities are sufficiently small. This is a reason why the relation (4) is set as the plausible general assumption throughout Yu. Higuchi et al. Open Journal of Modern Linguistics this paper. However, if some density in the networks is not small, the relation (40) may no longer be approximated by the relation (4), that is, the averaged probability A B p + may be quite different from the "density" in the network, the ratio of number of edges in a graph G as is shown in (3). See an example in Section A6.2; see also (Meara, 2007).

A4. Proofs of Inequalities for Our Estimation
n n n n n n n n n n n n p p n n p p n n n n n n n n n n A B n n n n p n n p n n p n n n n n n If the inequality (12)   n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n In addition, we can express in Equation (15) n n n n p p p n n n n n n n n which completes the proof.  (4) and (28)

A5. Exact Expressions of Densities under Assumption 3.8
n n n p p p n n n n n n n n n n n n p p n Simplifying (56) leads to A BB p p p n n n n n n n n n n n p Combining (57) we can get the desired expressions (52) and (53) where we put 999 p k = and k is the "number of links per word", in other words, the regularity of outer degree in digraph; this formula (59) almost recovers the results of simulations in Table 3 in (Wilks, Meara, & Wolter, 2005). See Table A1.

A6.2. For a Model in Meara (2007)
In Meara (2007), as a kind of small world model, the following random digraph model is introduced: 1) 1000 words are grouped into 20 cluster, each consisting of 50 words; 2) within these clusters, every word has random k-out-degree ( 3 k ≥ ); 3) among these clusters, we give additional 50 arcs. For details, see Section III.2 in (Meara, 2007). In the sense of the probability of arc occurring, it is obvious that 49 q k = within every cluster is quite denser than ( ) 50 950 50 20 1 1900 r = × × = among clusters; this implies every cluster is almost isolated. For such a model, the results of "hit rates" in the task of five randomly selected words under the computer simulations are given. Now let us interpret these results in our context and methods. First we divide into 7 cases based on the number and type of clusters to which selected 5 words belong: case (1, 1, 1, 1, 1), case (2, 1, 1, 1), case (2, 2, 1), case (3, 1, 1), case (3, 2), case (4, 1), case (5). 1) For case (1, 1, 1, 1, 1): This implies there exist 5 clusters such that each of them includes just one word in randomly selected 5 words. Then, considering "direct" and "indirect" associations by the similar way to that in the previous subsection, we can obtain the probability of association occurring within 5 words in this case that here any association occurs among clusters only and never occurs within each cluster in this case.
2) For case (2, 1, 1, 1): This implies there exist 4 clusters such that just one of them includes two words and each of three others does just one word. Then, considering "direct" and "indirect" associations within cluster and among clusters, we can obtain the probability of association occurring within 5 words in this case that  (Meara, 2007), let us put 49 q k = and 1 1900 r = and observe every output value of (67) for 3, 4, , 20 k = . We immediately find our Table A2 almost recovers the results of simulations in Figure 7 in Meara (2007).
We may recover other results in simulation by similar discussion.