A Statistical Theory of Language Translation Based on Communication Theory

We propose the first statistical theory of language translation based on 
communication theory. The theory is based on New Testament translations from 
Greek to Latin and to other 35 modern languages. In a text translated into 
another language, all linguistic variables do numerically change. To study the chaotic 
data that emerge, we model any translation as a complex communication channel 
affected by “noise”, studied according to Communication Theory applied for the 
first time to this channel. This theory deals with aspects of languages more 
complex than those currently considered in machine translations. The input 
language is the “signal”, the output language is a “replica” of the input 
language, but largely perturbed by noise, indispensable, however, for conveying 
the meaning of the input language to its readers. We have 
defined a noise-to-signal power ratio and found that channels are differently 
affected by translation noise. Communication channels are also characterized by 
channel capacity. The translation of novels has more constraints than the New 
Testament translations. We propose a global readability formula for 
alphabetical languages, not available for most of them, and conclude with a 
general theory of language translation which shows that direct and reverse 
channels are not symmetric. The general theory can also be applied to channels of 
texts belonging to the same language both to study how texts of the same author 
may have changed over time, or to compare texts of different authors. In 
conclusion, a common underlying mathematical structure governing human 
textual/verbal communication channels seems to emerge. Language does not play the 
only role in translation; this role is shared with reader’s reading ability and 
short-term memory capacity. Different versions of New Testament within the same 
language can even seem, mathematically, to belong to different languages. These 
conclusions are everlasting because valid also for ancient Roman and Greek 
readers.

. List of languages used in the NT translations (Matthew, Mark, Luke, John, Acts, Epistle to the Romans, Apocalypse). Total number of words W, sentences S and interpunctions I. Average values of characters per word CP, and words nW, sentences nS, interpunctions nI per chapter. In brackets: standard deviation. Last access to the indicated web sites was in the week October 5 to 9, 2020. For avoiding misuse of the results reported in Table 1, Table 2, notice that the average values shown in Table 2 do not coincide with averages calculable from  Table 1, because, in general, the average value of a ratio is not equal to the ratio calculated from the total values. For example, for Greek the total number of words divided by the total number of sentences (i.e., an estimate of the average value of the variable "words per sentence"), from Table 1 is 100,145/4759 = 21.04, while the average value of the ratio of the number of words per chapter divided by the number of sentences per chapter is 23.07 ( Table 2).
The correlation r between the number of characters and the number of words is not reported because, as for Italian [28], for all languages r > 0.990. Finally, notice that the lists of names (Genealogy) in Matthew 1.1 -1.17 and in Luke 3.23 -3.38 have been deleted for not biasing the statistics of all linguistic variables. In the following sections we investigate in-depth all these variables.

The Ideal Translation and the Real Translation
When a text written in a language is translated into a text written in another language, all linguistic variables do numerically change. Besides the total number of words W, sentences S and interpunctions I, the other main linguistic variables are the number of words n W , sentences n S , and interpunctions n I , per chapter. To them we add the number of characters per word C P , words per sentence P F , words per interpunctions I P , interpunctions per sentence M F . We refer to this latter set of variables as the deep-language variables. All these variables of language Y can be statistically compared to those of a reference language X (Greek) by calculating the correlation coefficient 38 r between any couple of variables y of language/translation Y and x of the reference language/translation X (in the following, where no confusion is possible, we refer to a variable and to the language/translation with the same mathematical symbol), and their expected regression line (i.e., the relationship between averages): with m the slope of the line. Of course, we expect, and it is so in the following, that no translation can yield r = 1 and m = 1, a case referred to as the ideal translation. In practice, we always find 1 r < and 1 m ≠ . The slope m measures the multiplicative "bias" of the dependent variable compared to the independent variable, the correlation coefficient measures how "precise" the linear fit is. Even though the ideal translation is never found, it is useful as a reference model to which real translations can be compared. In the following we refer to it as the self-translation channel. Figure 1 shows the scatterplot between n W in Greek and n W in the other languages listed in Table 1, with the regression lines (1); it shows with greater detail what reported in Table 1, Table 2. We can notice that translations can use more 38 The correlation coefficient r between two variables x, y, with averages mx, my and standard deviations sx, sy, is given by   Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference ("error") between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.
or fewer words than Greek, and that Latin (red line) is one of the closest translations to Greek. Table 3 lists the values of the correlation coefficient r and slope m.
Latin is the translation better correlated to Greek (r = 0.994), Hebrew the worst (0.949).
According to the regression lines, i.e., to the relationship between the average values in Y for assigned values in X (Greek), the translations that reduce the number of words (regression lines below the 45˚ line y x = in Figure 1) mostly belong to the Balto-Slavic family, while the translations that increase this number belong to the Romance and Anglo-Saxon families (except Finnish). The range is . Figure 1 also shows the histogram of the difference ("error") between the actual number of words in a given translation and the average number of words in that translation calculated from the regression line, for a given Greek value. The spread of these latter values makes r < 1. The probability density function deducible from Figure 1 can be modelled as Gaussian. Figure 2 shows the results concerning n S . All languages/translations have more sentences than Greek, ranging from Latin (m = 1.123) to Haitian (m = 2.085), Table 3, therefore implying a multiplicative bias larger than the words bias, and saying that translations have very different distributions of full stops and, in general, interpunctions, not only compared to Greek, but also compared to each other. The correlation coefficients are all significantly lower than those concerning n W , in the range 0.899 0.969 r ≤ ≤ , Table 3. All translations convey the same meaning but with different quantities of words and sentences. Figure 3 shows the results concerning n I . Most translations use more interpunctions than Greek, ranging from Swedish (m = 1.107) to Haitian (m = 1.730), see Table 3, therefore implying, again, a multiplicative bias larger than that found with words and sentences. Interpunctions impact directly on readers' reading ability and short-term memory capacity. The correlation coefficient varies in the range 0.938 0.974 r ≤ ≤ .    Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference ("error") between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.  Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference ("error") between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.
A larger spread can be noticed in the deep-language variables P F , I P and M F ,  Tables 1-3, it is very useful to consider a translated text as the output of a communication channel fed by the original text. The characteristics of this channel (one for each stochastic variable) can give us more insight into the mathematical/statistical deep structure of al- phabetical (and possibly human) languages. Before doing so, in the next section we define a useful parameter, namely the noise-to-signal power ratio of a real translation channel compared to the ideal channel.

Noise-to-Signal Power Ratio and Its Universal Geometrical Representation
We characterize any translation and its linguistic stochastic variables as a complex communication channel, made of parallel channels-one for each variable-affected by "noise". The input language is the "signal", the output language is a "replica" of the input language, but largely perturbed by noise. From the point of view of the output language this noise is, of course, indispensable for conveying the meaning to readers of the output language. To study these channels, we define a suitable noise-to-signal power ratio and use a geometrical representation borrowed from author's design of deep-space radio links [31], also applied in [32]. This geometrical representation is universal.
Two variables y and x, linked by a regression line y = mx, where m is the slope of the line, are perfectly correlated if the correlation coefficient r = 1, and are not biased if m = 1, in other words, if the regression line is y = x (45˚ line, m = 1) and all y-values lie on the line (r = 1). If these conditions are not met, we consider the variance of the difference between the regression line values (m ≠ 1) and the ideal line y = x values, at given x-values, as the "regression noise" power N m , and the variance of the difference between the values not lying on the line and the regression line y = mx, (r ≠ 1), as the "correlation noise" power N r .
Let us apply these concepts to language translation. Defined the variance 2 x s of language x and 2 y s of language y, the difference y -x between the regression line of the real translation channel and that of the ideal channel is given by , therefore the variance (or power) of the regression noise is given by:     Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference ("error") between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.

( )
Then, the regression noise-to-signal power ratio, R m , is given by: Notice that in (3) what counts is the absolute difference 1 m − because R m is an even function (parabola) around m = 1.
According to the theory of regression lines [30], the fraction of the variance  Therefore, the correlation noise-to-signal power ratio, R r , is given by: Now, because the two noise sources are disjoint, the total noise-to-signal power ratio of the channel is given by: By (3) and (6), R depends only on the two parameters m and r of the regression line (Table 3), given by: For each couple of the same variable, in Greek and in a translation, we can represent Equation (8) graphically by considering the variables (not to be confused with translations): , which becomes infinite at the origin and decreases as the radius of the circle increases.
As discussed in [31], among other features not of interest here, adopting the noise-to-signal power ratio instead of the more common signal-to-noise power ratio allows this graphical representation, which immediately shows how R r and R m , through their square roots, contribute to the total R, and which of the two pushes the translation away from the ideal self-translation.
In conclusion, the comparison between any couple of corresponding variables can be studied as a "communication channel" in which the input signal is the Greek text variable and the output signal is the translation variable. Compared to the ideal channel, the actual channel is noisy, always characterized by R > 0. Of course, as already noted, this indispensable "noise" is what actually makes the translation intelligible to the intended readers of the translated texts. In the next section we study these communication channels.

Linguistic Communication Channels
We compare, for each chapter, the numbers of words, sentences, interpunctions, and the so-called deep-language variables P F , I P , M F , of the original Greek texts to those of another language. The values of the slope m of the linear model (1) and the correlation r for all variables and translations can be read in Table 3.
From these data we can calculate and the noise-to-signal power ratio.
Let us first consider the words channel n W . Figure 7 shows the results obtained according to the geometrical representation discussed in Section 4. The closer the point is to the origin, the less noisy the channel, therefore implying a communication channel is closer to the ideal channel. Latin, Basque, Russian and Croatian are the least noisy languages (the black circles will be discussed in A regression line Y aX b = + with a > 0, as Equation (10), is due to languages with m > 1, while a regression line with a < 0 is due to languages with m < 1.
From Equation (10) it turns out that, even though some translations can be practically unbiased (m ≈ 1), as is the case of Slovak, they can never be perfectly correlated with the Greek texts, i.e., their correlation coefficient can never approach 1. In fact, when m = 1, i.e. X = 0, from Equation (10) we get Y = 0.157 and, by setting m = 1 in Equation (6), we can calculate the corresponding "irreducible" (minimum) correlation coefficient: This value has to be compared with the minimum value 0.949 of Hebrew (Table 3).
In conclusion, even though the channel is very close to being ideal for the slope (m ≈ 1, no bias on the average, very small regression noise), it can never be ideal for the correlation coefficient, therefore there is always some significant correlation noise around the 45˚ line. Notice that there is no clear trend for the various language families, except for the Balto-Slavic family, which minimizes the regression noise X, because m ≈ 1, therefore these translations are grouped towards the Y-axis. The noisiest languages are Norwegian, Cebuano and Haitian.
Let us consider the sentences channel n S , whose results are shown in Figure 8.
are further away from the origin than those of the words channel, therefore the noise-to-signal power ratio is greater than that of the words channel. Latin is, again, the least noisy language, together with Croatian and Basque. Moreover, as already noticed, the number of sentences tend to be larger than in Greek, therefore m > 1.  Table 3, would be very similar. Now, because characters and words are very much correlated (r > 0.990 for all languages, not shown but verified, just like for Italian literature [28]), this observation applies also to the characters channel.
Let us consider the interpunctions channel n I , whose results are also shown in Figure 8. This channel is noisier than n W and n S channels. Swedish is the least noisy language, Haitian the noisiest. Each language, in fact, introduces a very different distribution of interpunctions in a chapter, both in type (full-stops, question marks, exclamation marks, commas, colons, semicolons) and quantity, therefore changing the length of sentences, word intervals, and interpunctions per sentence. The regression line drawn in Figure 8 is given by , the lowest of the three channels examined so far.   Let us consider the number of words per sentence, P F , a deep-language variable. Figure 9 shows the results obtained. This channel has a large correlation noise, as we can see from the range of Y, a consequence of the very low correlation coefficients ( Table 3). The least noisy language, again, is Latin, the noisiest is Norwegian.
The results of the channel concerning the number of words per interpunction, i.e. the word interval I P , are also shown in Figure 9. The least noisy languages are Basque, Latin, Estonian and Croatian, the noisiest is Haitian. In general, the I P channel is less noisy than the P F channel. It seems that I P cannot be set as much independently from Greek as P F seems it can be. A likely explanation is that the word interval is empirically correlated with the short-term memory capacity, and this capacity not only is limited according to the 7 ± 2 Miller's law [27], but it cannot change so much in humans, regardless, of course, of the language used, therefore it varies less from language to language. This is not the case for P F , a variable more linked to the output language, or translation style and intended readers through a readability index (see Section 8), than to human short-term memory capacity.
The results of the channel concerning the number of interpunctions per sentence, M F , are also shown in Figure 9. The least noisy language is again Croatian, the noisiest is again Haitian, with Y ≈ 60 (due to the very low correlation coefficient 0.012, practically zero) and X ≈ 0.3, not shown because much out of scale. Notice that I P and M F channels are quite similar for most languages. Compared to n W , n S and n S channels, the deep-language variables channels are the noisiest. The reason seems to be, again, the different distribution of interpunctions. For these channels we have not drawn regression lines because the correlation coefficient is small. Let us summarize the main results of this section. The channels studied are differently affected by the translation noise. The most accurate channel is the word channel n W , a finding that seems reasonable. Humans seem to express a given meaning with a number of words-i.e. finite strings of abstract signs (characters)-which cannot vary so much even if some languages (Hebrew, Welsh, Basque etc.) do not share, according to scholars, a common ancestor with most other languages. This result seems to be something basic to human processing capabilities.
The number of sentences and their length in words, i.e. P F , can be treated more freely. We know that P F affects readability indices very much, as shown for Italian [28], therefore, this variable tends to be better matched to the intended readers, with specific reading ability, not to the original Greek readers of the Roman Empire.
Finally, we observe that, independently of the different channels, the correlation noise is always larger than the regression noise, therefore indicating that every translation tries as much as possible not to be biased, but it cannot avoid being decorrelated, with correlation coefficients which approximately decrease from words, to sentences, to interpunctions and down to the deep-language variables.   Besides the noise-to-signal power ratio, communication channels can be also characterized by the channel capacity, as we discuss in the next section.

Channel Capacity
The noise-to-signal power ratio and its universal geometrical representation is not the only interesting way for studying noisy channels. Noisy channels can be also characterized by a single variable, namely the channel capacity or mutual information defined by Shannon [26], between the stochastic variables x (input) and y (output), see also [33]. In the following subsections, firstly we recall the channel capacity of communication theory and define what we mean by "symbol"; secondly, we assess, for the first time, the size of channel capacity obtainable with linguistic variables.

Channel Capacity According to Communication Theory
According to Shannon [26], under some assumptions, the capacity (bits per symbol) of the channel X Y → is related directly to the channel signal-to-noise power ratio 1/R, according to: In our analysis the term "symbol" is defined according to the linguistic variable under study. For example, in the words channel the "symbol" is defined as In other words, we do not consider the classical information content of texts according to communication/information theory, which, to a first approximation, is measured by the entropy of letters [34], a concept applicable to machine translation but not to human information processing, which is based on words, sentences and interpunctions distribution. Indeed, the short-term memory responds to words not to bits, therefore the use of entropy can be highly misleading in estimating the characteristics of the linguistic channels defined in the present paper (Appendix B).
For a constant o R R = , Equation (12) gives the minimum channel capacity if the noise is Gaussian. If the noise is not Gaussian, the actual channel capacity is larger than (12) [26].
Of the two noise sources defined in Section 4, the correlation noise and the regression noise, the latter is deterministic (it could be cancelled by dividing the  (12) is pessimistic. In any case, this is not of concern here because Equation (12) can be used for comparing different translations.
We have already shown contours of constant capacity C (given, of course, by Figures 7-9, namely the black arcs of circles. In the origin of the Cartesian coordinates R = 0, therefore ρ = ∞ and C = ∞ . This last result, valid for the continuous channel assumed in Equation (12), merely means that the channel does not impose any limit to the output information, therefore in this case the mutual information coincides with the input self-information of the Greek texts.
Of the channels studied in Section 5, the words channel n W has the largest channel capacity for most translations. Figure 10 shows the scatterplot between the capacities of n W and n S channels. We notice that the two channels are quite correlated; for Welsh the two capacities are even practically identical. Figure 10 shows also the scatterplot between the capacities of n W and n I channels. The two capacities are practically uncorrelated. In Appendix C we report the scatterplots of the capacities of words channel and sentences channel with the deep-language channels capacities. In all cases, we notice a poor correlation, except partially for the P F channel, therefore evidencing, again, the fact that every translation has its own pattern of interpunctions within a chapter, which determines P F , I P and M F .
Some interesting observations can be done on the mixed scatterplots shown in Figure 11 between I P and n W , n S and I P channels capacities. The correlation between these variables is evident: as I P increases, thus loading more reader's short-term memory, the channel capacities decrease. In other words, by decreasing this important deep-language variable, I P , channels tend to be closer to the ideal channels of words, sentences and I P itself. Differently of the word interval I P , the number of words per sentence P F is quite correlated only with its channel capacity, Figure 12. As P F approaches the Greek value (23.07, Table 2), the channel capacity increases. This different behavior compared to Figure 11 where, as I P approaches the Greek value 7.47, I P channel capacity decreases, underlines that I P seems to be more related to how human brain processes texts (short-term memory), regardless of the particular language. In other words, translations do not follow the high Greek I P . On the contrary, P F is more related and matched to the intended readers through the readability index, which does not consider I P [28].
In the next subsection we discuss how large is the capacity of linguistic channels.

Channel Capacity Size
Two questions arise: 1) Are the channel capacities large? 2) How can we assess how large they are? Let us start with studying the sensitivity of the channel capacity to the parameters m and r. Figure 13 shows a universal chart, drawn from        Equations (12) and (8), which describes the relationship between the channel capacity C and the slope m, as a function of the correlation coefficient r. For illustration, we have also reported the values of the words channel capacity of some translations.
The maxima of C are found from Equation (12) when Therefore, from (8) Consequently, from (12) we get: Because of (15), in Figure 13 we can notice a very sharp increase only for very high correlation coefficients. In actual translations, however, the capacity can be significantly large, not too far from the maximum value obtainable from Equation (15). In fact, defined the normalized capacity C/C max , Figure 13, Figure 14 show how C/C max varies. Notice that C/C max practically follows the same mathematical function, regardless of the channel (words or sentences) when the correlation coefficient r is about the same for all languages ( Table 3). The same result is also found for the interpunctions channel (not shown for brevity). For P F and I P channels ( Figure 14) no regularity emerges because of poor correlation coefficients, another sign that these deep-language variables depend more profoundly on the particular translation, not on the language. The M F channel follows the same trend (not shown).
In conclusions, the capacity of n W , n S and n I channels follow very closely the universal chart because of similar high correlation coefficients; on the contrary, the capacity of P F and I P channels is more spread because their correlation coefficients greatly varies from translation to translation.

Word Interval and Short-Term Memory
As studied and discussed in [28], the number of words per interpunctions, namely the word interval I p , varies in the same range of the short-term memory capacity-given by the 7 ± 2 Miller's law [27], a range where 95% of all occurrences are found-and is very likely related to it because interpunctions organize small portions of more complex arguments in short chunks of text. Moreover, drawn I p against the number of words per sentence P F , I p tends to saturate to a horizontal asymptote as P F increases. In other words, even if sentences get longer, I p cannot get larger than about the upper limit of Millers' law (namely 9), because of the constraints imposed by the short-term memory capacity of readers.
Empirically (best-fit) the average value of I p is related to the average value of P F according to the relationship [28]: where I P∞ gives the horizontal asymptote, and P Fo gives the value of P F at which the exponential falls at 1/e of its maximum value. We apply Equation (16) to the NT translations. Because both I p and P F depend on the translation, we find different constants in Equation (16), listed in Table 4, together with data concerning readability index discussed in Section 8. Figure 15 shows the scatterplot concerning Greek, Latin and Hebrew. As for the Italian Literature (see Figure 16 of [28]), I p spreads in Miller's range. Not surprisingly, the ancient readers of these texts had the same short-term memory capacity of modern readers, i.e. they followed Miller's 7 ± 2 law. This finding is confirmed by the results concerning modern languages for which, however, the spread within Miller's range can be different from translation to translation.
Some translations tend to use shorter values of I p , as Latin and Hebrew ( Figure   15), therefore loading less reader's short-term memory than other translations do, e.g. Italian, French and English (see asymptote values I P∞ in Table 4). In Appendix D we show more graphical examples. Figure 16 shows all best-fit models of Table 4 and also the best-fit for Greek, with ±1 standard deviation calculated from the models of Table 4. We see that In conclusion, each translation tends to address readers with different reading abilities because small I p values are better matched to readers with small short-term memory capacity, who, therefore, can handle only short sentences, which correlates well with a large readability index, as we show in the next section.

Readability Index
As discussed in [28], after an in-depth review based on many references there listed-to which we refer readers for further details-a readability formula gives an index that anyone can calculate directly and easily, so that a writer can sufficiently match text and expected readers. Its "ingredients" are understandable by anyone, because they are interwound with long-lasting writing and reading experience based on characters, words and sentences. A readability formula gives an index based on the same stochastic variables, regardless of the text considered, thus it provides an objective measurement for comparing different texts, or authors. A final objective readability formula-or software-developed methods-is very unlikely to be found or accepted by everyone. On the contrary, instead of absolute readability, readability differences can be more useful and meaningful. The classical readability formulae provide these differences easily and directly.      In particular, the last observation can justify our present proposal to adopt a readability formula that can be used for comparing texts of different languages because most of them do not have a readability formula, and few adapt some formulae studied for English texts to their texts [35] [36]. The proposed formula, of course, does not exclude using other readability formulae-e.g., the large choice for English [37]-but it allows to compare, on the same ground, the readability of texts written in different languages.
For this purpose, we propose to adopt, as a calque, the readability formula used for Italian, amply studied in [28], known with the acronym GULPEASE [38], and given by: In Equation (17a) p is the total number of words in the text considered, c is the number of characters contained in the p words, f is the number of sentences Notice that Equation (17a), as all readability formulae found in the literature, does not contain any reference to interpunctions, therefore it does not consider the very important parameter linked to the short-term memory capacity, namely the word interval I P .
G can be interpreted as a readability index by considering the number of years of school attended in Italy's school system, as shown in Figure 17. The larger G, the more readable the text. By noting that P C c p = ; 1 F f p P = , G can be written as: In [28] we have shown that the term 10 c P G C = × (loosely referred to as the semantic term) varies very little from text to text and across centuries, while the term 300 (loosely referred to as the syntactic term) varies very much and, in practice, determines the readability index. We propose to use this formula also for the other languages listed in Table 1, by scaling the constant 10 of the semantic term according to the ratio between the average number of characters per word in Italian,   Table 1. The rationale for this choice is that C P is typical of a language and, if not scaled, would bias G, without really quantifying the change in reading difficulty of readers, who are accustomed to reading in their language shorter or longer words, on the average, than those found in Italian. In other words, this scaling avoids changing G for the only reason that a language has, on the average, words shorter or longer than Italian.  On the other hand, we maintain the constant 300 because P F depends significantly on reader's reading ability and short-term memory capacity [28], in other words on translator's choice. Therefore, the formula takes already care of the reader to whom the translation is addressed. Finally, notice that the constant 89 sets just the ordinate scale, therefore it has not impact on comparisons. Therefore, the readability formula of a text written in a language with average P C characters per word is given by: By using Equation (18) Figure 18 shows G C and G F versus G, for Greek, Latin and for all languages, with some other examples shown in Appendix E. We can notice that G F largely determines G, compared to G C . The regression line relating G F to G, drawn in  Figure 18, is −0.074, practically zero, therefore confirming that G is mainly determined by G F . Figure 19 shows the scatterplot and the regression lines between the values of G in a translation and those in Greek, and the histogram of the difference (error) between the actual values and the regression line values. Table 4 reports average values and standard deviations for all translations, together with the slope and correlation coefficient of the regression lines shown in Figure 19. As we can notice, each translation sets different readability values for their intended readers, in a large spread. In other words, as mentioned above, the number of words per sentence P F distinguishes significantly the translations. From Table 4 we notice that Welsh, Albanian and Greek have the lowest average G (57 -58), making them the least readable translations, while Hebrew (69.64), followed by Polish and Czech, are the most readable translations. Now, the texts of these two extremes, to be "easy" to read according to Figure 17, require 8 years of equivalent Italian schooling for G ≈ 57 and 6.5 years for G ≈ 70. They would become "difficult", "very difficult" or even "almost unintelligible" to readers with very few years of schooling.
In conclusion, Equation (18) can be useful for comparing the readability of texts (not necessarily translations) written in different languages because of a "common ground" for interpreting them, namely Figure 17, which can be used as a first guide to assess readability according to the years of schooling.    In Table 5 we notice that even the number of words and sentences can change within the same language, in versions sometimes labelled as "easy-to-read", or "modern" language etc. In English, for example, it is clear that St. James' version is the most difficult to read (G = 57.2) but it loads less reader's short-term memory ).

Different NT Translations within the Same Language
In German, the versions tend be much closer, even Luther's, so that they seem to address very similar audiences.
The spread of the values within the same language can be a sizeable fraction of the overall range calculable from Table 1 and Table 2   In conclusion, it is clear that each NT different translation within the same language addresses different audiences, as it can be noticed from the range of the linguistic parameters, but, more interestingly, a translation in a language can be confused, mathematically, with the translation in another language. In other words, this preliminary sampling seems to confirm that language does not play the only role in translation, but that this role has to be shared mainly with reader's reading ability (i.e., P F , G) and short-term memory (I P ).

Literary Text Translations: Treasure Island
Another question arises: Are the above results only applicable to NT translations, or can they be also applied to translations of literary texts, such as novels? In this section we show, preliminarily with just one example, that novels tend to show similar statistics, but with more constraints on the translations than those found in the NT translations.
We have done the following exercise. We have studied the translations of Treasure Island (by R.L. Stevenson) from the original English text to Italian, French and German, by considering each chapter as text unit (34 chapters). The comparison to the NT translation must be done, of course, by starting first with the English version of the NT and then studying its translations. Only after this study, we can consider Treasure Island as input text and calculate the same statistics. Therefore, we take the English NT as the reference (input) language and Italian, French and German as output languages, as if these NT versions were obtained by translating the English text, not the original Greek text. This hypothesis assumes, of course, that if the Italian, French and German translators had started from the English version of the NT, they would have ended up with the same text translated from Greek. This might be reasonable, although not directly controllable. We show below that the assumption can be justified. Table   6 reports the statistics concerning Treasure Island original text and its translations. Table 7, Table 8 report the results on channel capacity obtained by considering English as the original NT text, while Table 9, Table 10 report the results on channel capacity concerning the direct translations of Treasure Island to Italian, French and German. We notice that the Italian translation uses the least number of words and sentences, and has also the highest correlation coefficients for all variables; therefore, its channels have also the largest capacities. In other words, the Italian translation is, mathematically, the closest to the English text, which appears surprising if we consider the different linguistic family.
Let us examine the single channels. In the words channel n W we notice that the slope m and correlation coefficient r of the three languages are about the same in both cases (Table 7 and Table 9), therefore our hypothesis, mentioned in the previous paragraph, on the translation of the English NT to the other languages is justified. More interesting, the channel capacity is about the same in both cases and very close to the maxima given by Equation (15).   In the sentences channel n S , on the contrary, m and r of the three languages are significantly different in the two cases. This is, of course, confirmed by the different capacities. This trend is further enhanced in the P F and I P channels ( Table  8 and Table 10), another evidence that, as we pass from words to sentences, to P F and to I P (or M F ), each translation has quite different ways of using interpunctions for their intended readers, therefore matching more reader's reading ability and short-term memory capacity.
Finally, it is very interesting to notice in n W and n S channels (Table 7 and Table 9), that the NT translation, mathematically, is more accurate and respectful of the original Greek text than the translation of Treasure Island. On the contrary, in P F and I P channels, Treasure Island translations are more accurate than NT translations because, very likely, all dialogues must be strictly respected in any translation.
In conclusion, the statistics of words and sentences of a novel seems to be similar to those found in the NT translations. For example, the ranking of the number of sentences, from minimum to maximum, is the same both in the NT and in the Treasure Island translations: Italian, English, French, German. It is almost the same for words, namely, Italian, German, English, French for the NT translations; Italian, English, German, French for Treasure Island translations. The translation of a novel seems to be more respectful of the original text than the NT translations for what concerns P F and I P , mainly because the translators must consider the presence of dialogues, whose fraction of the total text can be, however, largely variable within novels, according to author's style etc. Because these results refer to just one particular case, they should be further assessed with other literary (novels) translations, a study well beyond the aim of this paper.

A General Theory of Translation: From Any Language to Any Other Language
It is possible to extend the statistical theory outlined in the previous sections in such a way to arrive at a general theory of translation applicable to any alphabetical language. By knowing the statistics of the various linguistic variables studied in the previous sections-obtained in the translation channel from Greek to other languages-it is possible as we show below, to estimate the statistics obtainable in the translation channel from any language to any other language of those listed in Table 1. The necessary data for extending the theory are those  Table 3, Table 4. The theory can also be applied to channels of texts belonging to the same language (not showing for brevity): for example, the channel that transforms words into sentences in a text can be compared to the channel that transforms words into sentences in a different text, both written in the same language. This comparison can be useful to study how texts of the same author may have changed over time, or to compare texts of different authors. Figure 20 shows, schematically, the block diagram of the direct channels from language Y k ( ) and the flow chart of the reverse channels, from any language Y j to the same language Y k (channels Greek in Figure 20). In other words, in the direct channel the translation is from a single language (Greek, or Latin, or Esperanto etc.) to another language, therefore, if the starting language is Greek, the translations are those discussed in the previous sections. In the reverse channel the output language is the same for all translations, therefore if the output language is Greek, the translations are from input languages Latin, Esperanto etc. So far, we have studied only one possible direct channel (from Greek to the other languages) and none of the reverse channels. In this section we study all possible direct and reverse channels for proposing a statistical general theory of translation. We first calculate the noise-to-signal power ratio obtainable in the general theory from the data reported in Table 3, Table 4. After, we show that direct and reverse channels concerning any couple of languages are not symmetric.

Noise-to-Signal Power Ratio
Let us consider two languages Y k and Y j , and let us refer to Greek explicitly as language X. With reference to the ideal channel whose output is X (self-translation), we have found that the same variable of languages k and j are related by linear relationships with the corresponding Greek variable x:  (Table 3).
Let us refer to the 36 possible translations from language 1: 37 k = -including Greek-to language j. In other words, language k plays now the role played before by Greek. By eliminating x, i.e. Greek, from Equation (19), we get the linear relationship between the input language k and the output language j : Compared to the reference language y k , the slope is given by: Therefore, the regression noise-to-signal power ratio, R m , of the channel is readily found, according to Equation (3), as: Notice that R m depends only the known slopes of the translations from Greek (Table 3).
Let us calculate the correlation noise-to-signal power ratio, R r . To apply Equation (6), we must insert the unknown correlation coefficient kj jk r r r = = between y j and y k due, of course, to the two noise sources in Equation (20). We can calculate its value from the correlation coefficients r k and r j reported in Table 3.
First, we notice that the total noise added to the regression line relating the output variable y j to the input variable y k is given by: Equation (24) has a geometric representation [39]. It can be seen as an appli- Now, by Equation (6), the correlation noise-to-signal power ratio in the translation channel from language k to language j is given by: In conclusion, the total noise-to-signal power ratio in the translation channel from language k to language j, for a given stochastic variable, is given by: ( )  (Table 1 and Table 2), while keeping very high correlation coefficients ( . On the contrary, in the channels concerning the deep-language variables P F , I P (with some exceptions), M F , and the readability index G, we mostly observe X Y < , i.e. m r R R < . In the P F channel, for example, in Table 3 we read 0.529 with a significant impact on the noise-to-signal power ratio.
From Figures 21-23 we can calculate direct and reverse channels capacities. Figure 24 shows the scatterplots between C kj (direct channel) and C jk (reverse channel) for some languages in the words channel n W . These scatterplots show that direct and reverse channels are not very different. Although kj jk C C ≠ , as we establish in the next subsection, they are, however, very similar for all variables and languages, regardless of their absolute value. In other words, a common underlying structure emerges from considering channel capacities, which seems to govern textual/verbal communication channels defined here, as we can see in Figure 25. In Appendix F we show results for the other linguistic channels.
In the next subsection we show that kj jk C C ≠ .

Direct and Reverse Channels Are Not Symmetric
Are direct and reverse channels concerning a couple of languages, e.g. translations from Greek to English and from English to Greek, symmetric? We can answer to this question by considering the channel capacity.
The specific question becomes now: Is the capacity C kj (bits per symbol) of the (direct) channel from language k to language j, equal to the capacity C jk of the (reverse) channel from language j to language k? In other words, can the two languages be exchanged in the input-output relationship without changing the statistical characteristics of the translation channel? According to communication theory [26], this happens in telecommunication channels affected by additive white Gaussian noise, but this is not true in translation channels, as we show We establish now that any couple of direct and reverse channels are not symmetric, unless k j m m = and k j r r = , a case never found. The reason for this asymmetry is because the noise added to any ideal (self-translation) channel to get the text in another language is statistically always different. According to Equations (12) and (27), and recalling that ij ji r r r = = , the two channel capacities are equal if: Let j k x m m = . After standard algebraic passages, we get following solution for the unknown correlation coefficient: To yield real values, the radicand in Equation (29) must be positive, and to yield a correlation coefficient must be less than 1, therefore we get the range: The lower limit in (31) is always satisfied because 2 1 0 x + > ; the upper limit gives: The inequality (31)  In the next subsection we assess how large the capacity difference is, in other words, how asymmetric direct and reverse channels are.  We can rank the channels according to the normalized RMS (%). Table 12 shows its overall average. The least variable channel is the readability channel, followed by the interpunction channel, the words and sentences channels, then the deep-language channels, therefore confirming that these latter variables are treated by translators with fewer constraints than the number of words or sentences, unless dialogues have to be respected, as seen with Treasure Island translations. In other words, in the NT translations differences are mainly due to specific linguistic variables, not to the particular language.

Conclusions
We have proposed a unifying statistical theory of translation, based on communication theory, which involves linguistic stochastic variables, some of which are not considered by scholars. Its main mathematical characteristics have emerged by studying the translation of most NT books.
When a text written in a language is translated into another language, all linguistic variables do numerically change. To study these apparently chaotic data we have characterized any translation as a complex communication channel affected by "noise", studied according to Communication Theory applied for the first time to this channel. The new theory deals with aspects of languages more complex than those currently considered in machine translations. The input language is the "signal", the output language is a "replica" of the input language, but largely perturbed by noise. For the output language, this noise is indispensable for conveying the meaning of the input language to its readers. To study these channels, we have defined a suitable noise-to-signal power ratio and applied a geometrical representation.
All channels studied are differently affected by translation noise. The more accurate channel is the word channel n W , a finding that seems reasonable. It emerges that humans seem to express a given meaning with a number of words-i.e. finite strings of abstract signs (characters)-which cannot vary so much even if some languages do not share a common ancestor. On the contrary, the number of sentences and especially their length in words, i.e. P F , are treated more freely by translators. P F , affects readability indices very much, therefore this variable tends to be better matched to the intended readers, with specific reading ability. Independently of the different parallel channels (one for each variable), the correlation noise (due to a regression line slope 1 m ≠ ) is mostly larger than the regression noise (due to a regression correlation coefficient dicating that every translation tries as much as possible to be not biased, but it cannot avoid being decorrelated, with correlation coefficients which approximately decrease from words, to sentences, to interpunctions and down to the deep-language variables P F , I P , M F and C P .
Different translations of the NT within the same language, mathematically, can be quite different and they can even seem to belong to different languages.
In other words, in language translations differences are mainly due to specific linguistic variables, not to the particular language. Clearly, they are matched to different audiences, an aspect not explicitly considered in machine translations.
Besides the noise-to-signal power ratio, communication channels can be also characterized by the channel capacity (bits per symbol, the latter suitably defined). This parameter can be relatively large, very close to the maximum value obtainable, for n W , n S and n I channels, less for P F , I P , M F channels. We have found that the NT translations are similar to translations of literary texts, as shown for the novel Treasure Island translated from English to Italian, French and German for n W , n S and n I channels. On the contrary, the translation of novels seems to set more stringent constraints on the translators for P F , and I P , channels because dialogues must be strictly maintained. A topic to be further researched.
The number of words per interpunctions I p varies in the same range of the short-term memory capacity. Drawn against the number of words per sentence P F , I p tends to saturate to a horizontal asymptote as P F increases because, even though sentences get longer, I p cannot get larger than about the upper limit of Millers' law, because of the constraints imposed by readers' short-term memory capacity.
We have defined a formula for the readability index of any alphabetical languages, based on a calque of the readability formula used in Italian, both for providing it to languages that have none, and also for estimating, on common grounds, the readability of texts belonging to different languages/translations.
Finally, we have extended the statistical theory outlined before to a general theory of translation applicable to any alphabetical language, even to texts written in the same language. The general theory shows that direct and reverse channels are not symmetric.
In conclusion, a common underlying statistical structure, governing human textual/verbal communication channel-not defeated by the mythical biblical Tower of Babel-seems to emerge from the findings. The main result is that the statistical and communication characteristics of a text, and its translations into other languages, seem to depend not only on the particular language-mainly through the number of words and sentences-but also on the particular translation because the text is very much characterized by the reading abilities and short-term memory capacity of the intended readers, aspects not explicitly considered in machine translations. These conclusions seem to be everlasting because applicable also to ancient Roman and Greek readers. A future research should extend the general theory to non-alphabetical languages.

Appendix B. Entropy and human information-processing
The short-term memory capacity follows Miller's 7 ± 2 law [28]. Notice, however, that the range of Miller's law does not refer to bits, but to a "buffer" in which are stored "chunks" of information of the type that can be "compressed", as are sequences of words or sequences of numbers (see [28] and the references there cited). In other words, humans process information differently from translation machines. As a consequence, the entropy of a language may be misleading in studying the linguistic channels defined in this paper. This point is now illustrated with an example.
Let us consider the total number of words W (Table 1)

of translations into
English, French, German, Italian, Russian and Spanish. The entropy of a language referred to single letters is termed F 1 by Shannon [34]. Estimated values of F 1 for the mentioned languages are reported in Table B1.
Now the total number of information bits produced according to Communication/Information theory can be estimated by:  The two lists are identical only for the first three lines-Russian, Greek and Italian-then they diverge. Now, the short-term memory responds to words not to bits, therefore the use of entropy can be highly misleading (e.g., see German, English, French and Spanish) in estimating quantities and the characteristics of the linguistic channels defined in this paper.   To language i from all others From language i to all others G channel capacity (bits per symbol)