1. Linguistic Communication Channels
Any language can communicate—across space and time—personal and intimate thoughts, stories and knowledge through literary (fiction), essays and scientific texts.
In recent papers [1] [2] [3], we have developed a general statistical theory on translation of literary texts, based on communication theory, which involves linguistic stochastic variables and communication channels suitable defined. For “translation” we mean not only the conversion of a text from one language to another—what is properly understood, of course, as translation—but also how some linguistic parameters of a text are related to those of another text in the same language. “Translation” therefore in the theory refers also to the case in which a text is compared (metaphorically “translated”) with another text, whichever is the language of the two texts.
In the literature most studies on relationships between texts concern translation because of the importance of automatic (i.e., machine) translation. Translation transfers meaning from one set of sequential symbols into another set of sequential symbols and was studied as a language-learning methodology or as part of comparative literature. Over time the interdisciplinary and specialization of the subject increased and theories and models have been imported from other disciplines [4] [5]. References [6] - [12] report results not based on mathematical analysis of texts, as we have done [1] [2] [3]. When a mathematical approach is used, as in References [13] - [25], most of these studies neither concern the aspects of Shannon’s communication theory [26], nor the fundamental connection that some linguistic variables have with reader’s reading ability and short-term memory capacity [1] [2] [3]. In fact, these studies are mainly concerned with automatic translations, not with a response of human readers. Very often they refer only to one linguistic variable, e.g. phrases [24]. As stated in [25], statistical automatic translation is a process in which the text to be translated is “decoded” by eliminating the noise by adjusting lexical and syntactic divergences to reveal the intended message. In our theory, on the contrary, what we define as “noise”—given by quantitative differences between source text (input) and translated text (output)—must not be eliminated because it makes the translation readable and matched to reader’s short-term memory capacity, a connection never considered in the mentioned references.
Since the 1950s, automatic approaches to text translation have been developed and have now reached a level at which machine translations are of practical use. References [10] - [44] are a small sample of the vast literature on machine translation, all characterized by the same paradigm.
However, as machine translation is becoming very popular, its quality is also becoming increasingly more critical and human evaluation and intervention are often necessary for arriving at an acceptable text quality. Of course, human evaluation can only be done by experts; therefore it is an expensive and time-consuming activity. To avoid this cost, it is necessary to develop mathematical algorithms which approximate human judgment [27]. The theory developed in [1] [2] [3] and the advances presented in the present paper can set some benchmarks for assessing translation quality.
The variables considered in the theory are: Total number of words W, sentences S, interpunctions I; readability index for any alphabetical language G; number of words nW, sentences nS, and interpunctions nI per chapter (or any chosen subdivision of a literary text, large enough to provide reliable statistics, e.g. few hundreds of words); number of characters per word CP, words per sentence PF, words per interpunctions IP (this parameter, called also “words interval”, is linked to the short-term memory capacity of readers [1] [2] [3]), interpunctions per sentence MF (this parameter gives also the number of IP contained in a sentence).
To study the chaotic data that emerge, the theory compares a text (the reference, or input text) to another text (output), with a complex communication channel—made of several parallel channels, two of which are considered in the present paper—in which both input and output are affected by “noise”, i.e. by different scattering of the data around an average relationship, a regression line in the theory.
In [3] we have shown how much the mutual mathematical relationships of texts in a language are saved or lost in translating them into another language. To make objective comparisons, we have defined the likeness index IL, based on probability and communication theory of noisy digital channels.
We have shown (e.g., see Section 4 of [3]) that two linguistic variables—e.g. nS and nW, or MF and nS—can be linearly linked by regression lines. This is a general feature of texts. For example, if we consider the regression line linking nS to nWin a reference text and that found in another text (e.g., written in the same language), it is possible to link nS of the first text to nS of the second text with another regression line without explicitly calculating its parameters (slope and correlation coefficient) from the samples, because the mathematical problem has the same structure of the theory developed in Section 11 of [2]. The theory, of course, does not consider the meaning of texts.
In the present paper, we apply the theory to compare how a literary character speaks to different audiences by diversifying and adjusting two important communication channels, namely the “sentences channel” and the “interpunctions channel”. In other words, we study how an author shapes a main character’s speaking to different audiences by modulating some of the linguistic parameters mentioned above. To show the possibilities and usefulness of the theory, we show how it can be applied to a relevant, great and voluminous literary corpus written by an Italian mystic of the XX-century, Maria Valtorta, whose texts (in Italian) have been studied with a multidisciplinary approach in the latest years [45] [46] [47] [48]. A similar approach can be, of course, applied to any other literary corpus written in any alphabetical language.
After this introduction, Section 2 recalls the fundamental relationships present in linguistic communication channels; Section 3 reports some biographical data on Maria Valtorta and her literary corpus; Section 4 recalls and applies a useful vectors plane of linguistic variables; Section 5 defines the theoretical signal-to-noise ratio in literary communication channels; Section 6 discusses the experimental signal-to-noise ratio obtained with Monte Carlo simulations; Section 7 uses the likeness index and the symmetry index to compare channels; finally Section 8 summarizes the main points of the paper and draws a conclusion. Appendices A and B report full data banks useful for assessing some relationships studied in the main text.
2. Fundamental Relationships in Linguistic Communication Channels
In this section we recall the general theory of linguistic channels. In a text, an independent (reference) variable x (e.g., nW) and a dependent variable y (e.g., nS) can be related by the regression line passing through the origin of the axes:
(1)
In Equation (1) m is the slope of the line.
Let us consider two different texts
and
, e.g. the sermons that a character, in the literary fiction, addresses to audience k and to audience j. For these texts, we can write more general linear relationships, which take care of the scattering of the data—measured by the correlation coefficients
and
, respectively, not considered in Equation (1)—around the average values (measured by the slopes
and
):
(2a)
(2b)
As is known, the linear model Equation (1) connects x and y only on the average (through m), while the linear model Equation (2) introduces additive “noise” through the stochastic variables
and
, with zero mean value [1] [2] [3]. The noise is due to the correlation coefficient
, not considered in Equation (1).
We can compare two texts by eliminating x, in other words, we compare the output variable y for the same number of the input variable x. In the example just mentioned, we can compare the number of sentences in two texts—for an equal number of words—by considering not only the average relationship Equation (1), but also the scattering of the data, measured by their correlation, Equation (2). We refer to this communication channel as the “sentences channel”.
If the linear relationship is between the number of interpunctions per sentence
and the number of sentences
, then by eliminating nS, we get the linear relationship between
of the first text with
of the second text. We refer to this communication channel as the “interpunctions channel”. Notice that, because MF is also the number of IP (called “words interval” [1]) contained in a sentence, and IP is linked to short-term memory capacity [1] [2] [3], this channel would describe how the short-term memory of the two audiences is addressed by the character with the interpunctions channel.
By eliminating x, from Equation (2) we get the linear relationship between, now, the input number of sentences (or interpunctions) in text
(now the reference, input text) and the number of sentences (or interpunctions) in text
(now the output text):
(3)
Compared to the new reference text
, the slope
is given by:
(4)
The noise source that produces the new correlation coefficient between
and
is given by:
(5)
The “regression noise-to-signal ratio”,
, due to
, of the new channel is given by [2]:
(6)
The unknown correlation coefficient
between
and
is given by [49]:
(7)
The “correlation noise-to-signal ratio”,
, due to
, of the new channel from text
to text
is given by [2]:
(8)
Because the two noise sources are disjoint and additive, the total noise-to-signal ratio of the channel connecting text
to text
, for a given stochastic variable, is given by [2]:
(9)
Notice that Equation (9) can be represented graphically [2]. Finally, the total signal-to-noise ratio is given by:
(10a)
(10b)
Of course, we expect, and it is so in the following, that no channel can yield
and
, therefore
, a case referred to as the ideal channel, unless a text is compared with itself (self-comparison, self-channel). In practice, we always find
and
. The slope
measures the multiplicative “bias” of the dependent variable compared to the independent variable; the correlation coefficient
measures how “precise” the linear best fit is.
In conclusion, the slope
is the source of the regression noise, the correlation coefficient
is the source of the correlation noise of the channel. Before proceeding with the study, in the next Section we sketch the biography of Maria Valtorta and introduce her literary corpus.
3. Maria Valtorta and Her Literary Corpus
Maria Valtorta (1897-1961) was an Italian mystic writer active in the years of World War II. Her literary and voluminous work—based, as she claims, on mystic visions, whose assessment is of course beyond science and our investigation—contains a detailed life of Jesus Christ. A rigorous and scientific analysis of her literary corpus on Jesus’ life—narrated in her main work Il Vangelo come mi è stato rivelato (The Gospel as revealed to me, in the following EMV), published in 10 volumes [50] —has evidenced the presence of many data on facts and events allegedly occurred 2000 years ago in Palestine, well beyond her knowledge, culture and skills [45] [46] [47]. She reports, in real time, what she sees and hears during many mystical visions—as she claims—in a period lasting several years [48]. She mentions towns, villages, buildings and palaces, Roman roads, mountain tracks, river Jordan, ports of the Mediterranean, lakes (Tiberias, ancient Meron), creeks, mountains and hills, trees and flowers, fragrances and perfumes, dresses, food, weather, sceneries and monuments of Palestine at Jesus’ times, a geographical area she never visited.
Bedridden since 1934 because paralyzed below the waist, she writes on a small stand, sitting on the bed with shoulders supported by pillows in Viareggio (Tuscany), during World War II and the few following years. In spite of a complete lack of any data possibly available at her times, every time some of the data she reports have been checked, they are unexpectedly correct, sometimes even anticipating what scholars would find years later her writings [47] [50] [51] [52].
She wrote in Italian 13,193 pages of 122 school notebooks [53], without making any correction, with a set of fountain pens always filled with ink because she did not know when the alleged visions would come. In these notebooks there are not only the events now published in the EMV, but also many other mystic writings, as she intercalated pages describing the events on Jesus’ life with many pages on various topics, including dictations and monologues addressed to her by the alleged Jesus (text referred below as Jesus says) or by other heavenly persons. In the following we drop the adjective “alleged”, although we always mean it throughout the paper because it is not our duty, or task, to declare or establish that her “visions” were real, because this is beyond the realms of science.
In this voluminous literary corpus, the character Jesus addresses different audiences: friends, disciples, parables and speaks extempore sermons to people, sermons in Synagogues, at the Temple in Jerusalem. The character delivers two well organized and coherent series of sermons at a locality named Clear Water, Jordan River Valley, and at a locality that Maria Valtorta describes in great detail and looks very alike the Horns of Hattin (Galilee). Some of the content spoken at the Horns of Hattin is reported in the gospel according to Matthew (Mt, 5), and universally known as the Sermon of the Mountain, although the “Sermon” reported in the EMV lasts a week, not a single day [46].
Table 1 reports the average values of the linguistic parameters in the indicated texts attributed to Jesus, mostly extracted from the EMV, and already studied (except Jesus says) for their setting, topics and duration in [46].
We first study the averages of the linguistic variables reported in Table 1 by using a vector representation [1] [2] [3] [54], which gives an overall view of how “close” the texts are.
4. The Vectors Plane of Linguistic Variables
The linguistic averages of Table 1 can be used to assess how “close” texts are in a Cartesian plane, by using a graphical tool which effectively compares different literary texts seen as vectors, representation discussed in detail in [1] [2] [3], here briefly recalled.
Let us consider the following six vectors of the indicated components:
,
,
,
,
,
and their resulting vector:
(11)
The choice of which parameter represents the component in the abscissa and ordinate Cartesian axes is not important. Once the choice is made, the numerical results will depend on it, but not the relative comparisons and general conclusions.
Figure 1 shows the resulting vector (11) for the texts listed in Table 1, and referred to (normalized) the coordinates of Clear Water (CW, located at the origin, coordinates (0, 0)), and those of Jesus says, JS, located at (1, 1). As already observed [46], we can notice, very clearly, that the data concerning the sermons
Table 1. Total number of words and sentences in the texts referred to the indicated audiences (the number in parentheses is the number of text subdivisions considered in calculating averages and regression lines) and average number of: characters per word (Cp), words per sentence (PF), words per punctuation marks (interpunctions)—which coincides with the word interval (IP) [1] —and punctuation marks per sentence (MF), which is also the number of word intervals contained in a sentence.
at Clear Water (delivered in 14 days) and at the Horns of Hattin (delivered in 7 days) are displaced from the other texts. They seem to belong to a set of data with different linguistic statistics. This striking difference underlines the peculiarity of these two coordinated and apparently planned series of sermons, compared to the other extempore sermons. Notice also that Jesus says is very much displaced from all other texts. It seems that the character Jesus speaks quite differently to a modern listener (i.e., Maria Valtorta) than when he speaks to people of his (alleged) own historical time. The clear distinction of Jesus says with the other texts will be further analyzed below.
Now, if Maria Valtorta’s claim could be accepted—i.e. she had visions of Jesus’ public life events and received Jesus’ dictations and monologues—the differences just underlined would not be surprising because, in this case, Jesus would be a real person living in his times when he speaks to people, and a contemporary person when he speaks to Maria Valtorta. However, because we, as scientists, are not allowed to accept her claim, we must therefore conclude that she is a very capable writer, because she distinguishes audiences, settings and topics in which the character Jesus acts.
Besides the vector analysis shown in Figure 1, in the next Section we study
Figure 1. Coordinates x and y of the resulting vector (11) of a literary work, referred (normalized) to the coordinates of sermons at Clear Water and the dictations addressed to Maria Valtorta, Jesus says, by assuming Clear Water as the origin, coordinates (0, 0), CW, and Jesus says located at (1, 1), JS. P: Parables; D: Disciples; PP: People; S: Synagogues. T: Temple; CW: Clear Water; HA: Horns of Hattin.
some communication channels linked to specific linguistic variables, such as sentences and interpunctions.
5. Theoretical Signal-to-Noise Ratio in Literary Communication Channels
In this Section we study how sentences and interpunctions build specific communication channels in a literary text, and calculate their signal-to-noise ratio defined in Section 2.
To apply the theory of Section 2, we need the slope m and the correlation coefficient r of the regression line between: (a) the number of sentences nS and the number of words nWto study the “sentences channel”; (b) the number of interpunctions per sentence MF and the number of sentences nSto study the “interpunctions channel”.
Table 2 reports the slope m and the correlation coefficient r of the regression line for the indicated texts. For example, in Hattin, if
, then in text blocks of 100 words there are on average
sentences and 2.1903 × 6.61 = 14.48 interpunctions (punctuation marks).
Figures 2-7 show, for some texts, the scatterplots and their regression lines. By looking at these figures, we can see at glance which texts have very similar regression lines. More difficult is to see whether the scattering of data is similar or not.
For example, in Figure 3 the regression lines of Disciples (cyan) and People (magenta) coincide. In other words, a given number of words contains, on the average, the same number of sentences in both texts. Therefore, the character Jesus addresses the two audiences with sentences of about the same length, on the average; see also the average values of PF in Table 1, PF = 16.30 and PF = 17.10, respectively. The correlation coefficients are very similar, r = 0.9462 and r = 0.9397. According to the theory of Section 2, the signal-to-noise ratio of the
Table 2. Line slope m and correlation coefficient r of the regression lines between the indicated variables, in the texts listed. Correlation coefficients are reported with 4 decimal digits because some coefficients differ only from the third digit.
Figure 2. Scatterplots and regression line between words (independent variable) and sentences (dependent variable) in the following texts: Hattin (blue squares and blue line); Clear Water (red squares and red line); Synagogues (black circles and black line).
Figure 3. Scatterplots and regression line between words (independent variable) and sentences (dependent variable) in the following texts: Disciples (cyan circles and cyan line); People (magenta circles and magenta line); Jesus says (black dots and black line). Notice that the regression lines of Disciples (cyan) and People (magenta) coincide.
Figure 4. Scatterplots and regression line between words (independent variable) and sentences (dependent variables) in the following texts: Hattin (blue squares and blue line); Parables (green circles and green line); Temple (magenta circles and magenta line).
Figure 5. Scatterplots and regression line between sentences (independent variable) and punctuation marks (interpunctions, dependent variable) in the following texts: Hattin (blue squares and blue line); Clear Water (red squares and red line); Synagogues (black circles and black line).
Figure 6. Scatterplots and regression line between sentences (independent variable) and punctuation marks (interpunctions, dependent variable) in the following texts: Hattin (blue squares and blue line); Parables (green circles and green line); Temple (magenta triangles and magenta line). Notice that the regression lines of Hattin (blue) and Parables (green) coincide.
Figure 7. Scatterplots and regression line between sentences (independent variable) and punctuation marks (dependent values) in the following texts: Disciples (cyan circles and cyan line); People (magenta circles and magenta line); Jesus says (black dots and black line). Notice that the three regression lines practically coincide.
sentences channel obtainable—i.e. the channel that transfers (translates) the number of sentences of the input text into the number of sentences of the output text—should be quite large, as we will show below (TableA4).
Similar results can be found in the scatterplots of interpunctions versus sentences. For example, in Figure 7 we can notice that the regressions lines of Disciples (cyan), People (magenta) and Jesus says (black) practically coincide. But the correlation coefficients are quite different: r = 0.9419 in Jesus says (Table 2, rightmost column) against r = 0.9586 in Disciples and r = 0.9567 in People.
Regression lines, however, take care and describe only one aspect of the relationship, namely the average values—recall that average values, as those shown in Table 1, belong to the regression line—and do not show the other aspect of the relationship, namely the scattering of data, which may not be the same when two regression lines almost coincide. The theory of linguistic channels recalled in Section 2, on the contrary, by considering both slopes and correlation coefficients, provides a reliable tool for comparing two sets of data, each described by the linear relationship Equation (2), according, for example, to the signal-to-noise recalled in Section 2 or to Shannon channel capacity [2].
Let us calculate the theoretical signal-to-noise ratios obtained in the sentences and interpunctions channels according to Section 2. Table 3 (sentences channel) and Table 4 (interpunctions channel) report the theoretical signal-to-noise ratio
(dB) in the channel between the (input) text indicated in the first column and the (output) text indicated in the first line.
For example, in the sentences channel (Table 3), from Parables (input) to Hattin (output) we read
(dB)—54.7 in linear units—and
(dB)—44.5 in linear units—in the reverse channel from Hattin (input) to Parables (output), showing asymmetry, a characteristic of linguistic communication channels [2] [3]. In the interpunctions channel (Table 4), from Parables
Table 3. Sentences channel. Theoretical signal-to-noise ratio
(dB) in the channel between the (input) text indicated in the first column and the (output) text indicated in the first line. For example, if the input is Parables and the output is Clear Water
dB.
Table 4. Interpunctions channel. Theoretical signal-to-noise ratio
(dB) in the channel between the (input) text indicated in the first column and the (output) text indicated in the first line. For example, if the input is Parables and the output is Clear Water
dB.
(input) to Hattin (output)
(dB)—378.4 in linear units—and
(dB)—410.2 in linear units—in the reverse channel from Hattin (input) to Parables (output). Notice the large
dB (11,220 in linear units) in the interpunctions channels Disciples ↔ People (Table 4).
Besides the asymmetry of the channels, these results say, for example, that the two texts in the channels Hattin↔ Parables are more similar in the interpunctions channel than in the sentences channel. In other words, the regression lines and scattering (i.e., “noise”) are more similar when the scatterplots of the interpunctions channels, if they were explicitly available, are compared than when the scatterplots of the sentences channels are compared. Therefore, the theory of linguistic channels can finely describe differences that, to a first approximation—as are the average values just reported and the regression line—would be largely lost. In conclusion, multiple linguistic channels can describe the “fine tuning” that a literary author can use to distinguish characters or the same character in different situations, as Maria Valtorta does.
However, as discussed in [3], an important issue here arises because of the different sample size used in calculating the regression line parameters listed in Table 2. In the next Section, we recall this issue and show how to deal with it.
6. Experimental Signal-to-Noise Ratio
Because of the different sample size used in calculating the regression parameters listed in Table 2, the slope m and the correlation coefficient r of a regression line, being stochastic variables, are characterized by average values (those reported in Table 2) and standard deviations, which depend on the sample size [55]. The theory would yield improved estimates of
, of course, if the sample size were larger. With a small sample size, the standard deviations of m and r can give too large a variation in
predicted by the theory—see the sensitivity of this parameter to the slope m and the correlation coefficient r in [3]. Only Jesus says is based on a relatively large sample size, 302 couples in the scatterplot. To avoid this inaccuracy—due to the small sample size from which the regression lines are calculated, not to the theory of Section 2—in [3] we have defined, used and discussed a “renormalization” based on Monte Carlo simulations—whose results we consider as “experimental”.
Now, we first recall the steps of the Monte Carlo simulation to be performed, and then we report the results concerning the sentences channel and the interpunctions channel.
6.1. Monte Carlo Simulations
For example, let us take Hattin as output text and the others as input texts, on turn. The steps of the Monte Carlo simulation, for example in the sentences channel, are the following:
1) Generate 7 independent numbers (the number of texts—i.e. sermons—in Hattin) from a discrete uniform probability distribution in the range 1 to 7, with replacement—i.e., a sermon can be selected more than once.
2) “Write” another possible “Hattin” with new 7 sermons, e.g. the sequence 2; 1; 6; … hence take sermon 2, followed by sermon 1, sermon 6, etc. up to seven sermons. The text of a sermon can appear twice (with probability 1/72), three times (with probability 1/73), et cetera, and the new Hattin can contain a number of words greater or smaller than the original text, on the average (the differences are small and do not affect the statistical results).
3) Calculate the parameters
and
of the regression line between words (independent variable) and sentences (dependent variable) in the new Hattin.
4) Compare
and
of the new Hattin (output, dependent text) with any other text (input, independent text,
and
, values listed in Table 2), in the cross-channels so defined, including the original Hattin(self-channel).
5) Calculate
,
and
of the cross-channels (linking sentences to sentences), according to the theory of Section 2.
6) Consider the values of
so obtained as “experimental” results
, to be compared to the theoretical results of Section 5. Notice that it is not necessary to generate also new Clear Water texts, et cetera, because we compare the experimental results to the theoretical results, therefore the input
and
must be the same, therefore those of Clear Water, et cetera. A new Clear Water, et cetera, is generated in the reverse channel.
7) Repeat steps 1 to 6 many times (we did it 5000 times).
Besides the usefulness of the simulation as a “renormalization” tool, shown in [3], the new sermons obtained in step (2) might have been “pronounced” by Jesus in the same occasion, because they maintain the statistical relationships between the linguistic variables of the original sermon.
In conclusion, the Monte Carlo simulation should take care of the inaccuracy in estimating slope and correlation coefficient due to a small sample size.
6.2. Sentences Channel
For the sentences channel, Table 5 shows the results for Hattin. Appendix A reports the results for all other texts.
The results in Table 5 clearly show the impact of
and
on
. For example, although
in Temple → Hattin is very close to
in Jesus says → Hattin, the average
are quite different namely
dB (i.e., 6.7 in linear units) in Temple → Hattin and
dB (16.5 in linear units) in Jesus says → Hattin, being the difference mainly due to the different slopes:
in Temple → Hattin (i.e., 100 sentences in Temple are “translated” into 131.7 sentences in Hattin for the same number of words) and
in Jesus says → Hattin. Similar observations can be done when slopes are very close but correlation coefficients are not.
Figure 8 and Figure 9 show the scatterplots between
and
for all texts. From them we can notice that
and
agree quite well up to about 20 - 25 dB, beyond which saturation occurs, a trend also shown in [3]. In other words, we can be confident in the reliability of
up to about 20 - 25 dB. For larger values,
can also be reliable, but in this case, a deeper statistical assessment would be necessary with regard to the sample size of input and output texts.
Figure 8 and Figure 9 also show channels that coincide. For example, in Figure 8, lower panel, rightmost figure, the channels Hattin → Parables, People → Parables, Disciples → Parables coincide. In this Figure, we can also notice that the channel with the largest
is Clear Water → Parables. In other words, for sentences Clear Water is the text closest to Parables, therefore stating that the character Jesus is addressing the two audiences similarly. This result is
Table 5. Sentences channel. Theoretical
and experimental (Monte Carlo)
in the indicated cross-channels, obtained by assuming Hattin as output text. The standard deviations are shown in parentheses. Hattin
refers to its self-channel. The average values and standard deviations of
and
refer to the estimated regression lines between the number of sentences in Hattin (output, dependent variable) and the number of sentences in the indicated texts (input, independent variable). We report 4 decimal digits in correlation coefficients because some values differ only from the third digit.
Figure 8. Sentences channels. The title refers to the input text. Scatterplots between the average
(Monte Carlo) and
. Hattin (blue square); Clear Water (red square); Temple (black triangle); Parables (green circle); Disciples (cyan circle);Synagogues (red circle); People (magenta circle); Jesus says (black triangle).
due to the combination of slope and correlation coefficient.
The higher
, the more similar the texts, as for example in Figure 9 People and Disciples (upper panel, leftmost figure, lower panel, leftmost figure).
In Section 7, we objectively compare channels and texts according to the likeness index IL, defined in [3].
6.3. Interpunctions Channel
Table 6 shows the results for Hattin and the interpunctions channel. Figure 10 and Figure 11 show the scatterplots between
and
for all texts (Appendix B reports the tables for the other texts.). We can notice, for example, that Hattin (Figure 10, upper panel, left) and Clear Water (Figure 10, upper panel, right) are the texts closest to Parables (green circles).
As in the sentences channel, also in the interpunctions channel
and agree quite well up to about 20 - 25 dB, beyond which saturation occurs, as is clearly shown in Figure 11, upper (Disciples) and lower (People) panels, left.
Notice that in general both
and
tend to be larger than those in sentences channel. Because this channel is connected with the words interval IP, and therefore with the short-term memory capacity [1] [2], this result may highlight the fact that most audiences are addressed by distributing the interpunctions
Figure 9. Sentences channels. The title refers to the input text. Scatterplots between the average
(Monte Carlo) and
. Hattin (blue square); Clear Water (red square); Temple (black triangle); Parables (green circle); Disciples (cyan circle);Synagogues (red circle); People (magenta circle); Jesus says (black triangle).
Table 6. Interpunctions channel. Theoretical
and experimental (Monte Carlo)
in the indicated cross-channels, obtained by assuming Hattin as output text. The standard deviations are shown in parentheses. Hattin
refers to its self-channel. The average values and standard deviations of
and
refer to the estimated regression lines between the number of sentences in Hattin (output, dependent variable) and the number of sentences in the indicated texts (input, independent variable). We report 4 decimal digits in correlation coefficients because some values differ only from the third digit.
Figure 10. Interpunctions channels. The title refers to the input text. Scatterplots between the average
(Monte Carlo) and
. Hattin (blue square); Clear Water (red square); Temple (black triangle); Parables (green circle); Disciples (cyan circle);Synagogues (red circle); People (magenta circle); Jesus says (black triangle).
Figure 11. Interpunctions channels. The title refers to the input text. Scatterplots between the average
(Monte Carlo) and
. Hattin (blue square); Clear Water (red square); Temple (black triangle); Parables (green circle); Disciples (cyan circle);Synagogues (red circle); People (magenta circle); Jesus says (black triangle).
within a sentence in a similar way, except for Temple (
) and Jesus says (
), see Table 1.
In Section 7, we objectively compare channels and texts according to the likeness index IL, defined in [3].
7. Likeness Index and Symmetry Index
The likeness index IL is based on probability theory and allows to “measure” how a linguistic communication channel is similar to another channel. In other words, the likeness index measures how much a text can be “mistaken”, mathematically, with another text, e.g., Hattin with Clear Water, by studying self- and cross-channels and their signal-to-noise ratios
, whose probability density functions are modelled as Gaussian, with average value and standard deviation reported in Table 5, Table 6. The probability problem is binary because a decision must be taken between two alternatives and its theory is fully developed in [3].
The likeness index is bounded in the range
;
means totally independent texts,
means totally dependent texts.
Although IL depends on both average value and standard deviation of
, a first assessment can be seen in Figure 12, which shows IL versus the difference
Figure 12. Scatterplot of the likeness index IL versus the difference between
of the self-channel (large value) and
of the cross-channel (smaller value) in the sentences channels (blue circles) and interpunctions channels (red circles), for all texts.
between
of the self-channel (usually the largest value) and
in a cross-channel (smaller value). Clearly, as the difference between the two
increases, IL rapidly decreases. The scattering of the values in Figure 12 is due to different standard deviations. A 6-dB difference (i.e., in linear units
of the self-channel is 4 times larger than
of a cross-channel) gives already
, which we can assume as a threshold below which two texts depend very little on each other. We report next the full results concerning the sentences channels and the interpunctions channel.
7.1. Sentences Channel
Table 7 reports IL between the indicated texts in the sentences channels. For example, in the channel Parables → Hattin
, while in the reverse channel Hattin→ Parable
, with a large asymmetry. The largest value is in the practically symmetrical channel People ↔ Disciples with
and
.
Let us discuss in more detail the results. Let us consider, for example, the channels to Hattin (column Hattin of Table 7). We see that Hattin is very similar to Parables (
), People (
), Disciples (
) and enough similar to Clear Water (
). This means that in every new Hattin simulated in step 2 of the Monte Carlo algorithm of Section 6.1, the regression line between sentences and words is very similar to that of the input text Parables, People, Disciples or Clear Water, so that the theory of Section 2 produces, in the end, these large values of IL. In other words, particularly Parables, People and Disciples are, with a large confidence measured by IL, “contained” in Hattin. Notice that the reverse situation is not true because of the large asymmetry: Parables (
), People (
) and Disciples (
), Clear Water (
).
Table 7. Likeness index IL between the indicated texts, sentences channel. The text in the first line indicates the output text, the text in the first column indicates the input text. For example, in the channel Parables → Hattin
, while in the channel Hattin→ Parable
.
Other interesting observations can be done:
1) People contains Disciples and vice versa. The two sets are, practically, the same set of data, can be fused together.
2) Clear Water barely contains Parables but not vice versa. Clear Water does not contain any other text.
3) Jesus says contains Temple but not vice versa. The “modern” Jesus includes the “ancient” Jesus but not vice versa. The ancient Jesus (column Jesus says) does not speak as the “modern” Jesus does.
4) Hattin “contains”, as already mentioned, all extempore sermons/speeches delivered to audiences made of unpredictable listeners (Parables, People and Disciples), but it does not contain Temple and Synagogues. In other words, in these institutional sites Jesus seems to speak differently than at the Horns of Hattin where he presents his Manifesto [46]. Because Clear Water came before Hattin (see the alleged chronology in [46]), there seems to be a significant change in the oratory and statistical characteristics of the sermons delivered in the two occasions, the last one (Hattin) being the model followed later by the character Jesus in other occasions.
7.2. Symmetry Index
As mentioned above, asymmetry is typical of most linguistic channels. Therefore, it is useful to define a new parameter, the symmetry index IS, linked to the likeness index by the relationship:
(12)
In Equation (12)
refers to the channel
(e.g. in Parables → Hattin),
refers to the reverse channel
(e.g., in Hattin→ Parable).
It can be shown that the symmetry index defined in Equation (12) is bounded in the range
[56];
means no symmetry,
means total symmetry.
Table 8 shows this index for all texts. As anticipated, the most symmetrical channel is People ↔ Disciples. The least one is Jesus says ↔ Clear Water, therefore confirming that the modern character Jesus speaks differently than the alleged ancient Jesus.
7.3. Interpunctions Channel
Table 9 reports the likeness index IL between the indicated texts in the interpunctions channels. We can notice that the largest IL is found in the channel People ↔ Disciples,
and
, therefore confirming that People contains Disciples and vice versa, also in this linguistic channel. In other words, as already observed, the character Jesus does not distinguish the two audiences.
Table 8. Symmetry index IS between the indicated texts, sentences channel. For example, in the channel Parables → Hattin
. The most symmetrical channel is the channel Disciples ↔ People,
. The most asymmetrical channels are Jesus say ↔ Hattin and Jesus say ↔ Temple
.
Table 9. Likeness index IL between the indicated texts, interpunctions channel. The text in the first line indicates the output text, the text in the first column indicates the input text. For example, in the channel Parables → Hattin
, while in the channel Hattin→Parables
.
In general, the likeness index of the interpunctions channels is lower than that in the sentences channel. Other observations are:
1) Hattin contains, with decreasing values, Parables (
), Disciples (
) and People (
), but not vice versa.
2) Jesus says contains People (
) and Disciples (
), but not vice versa.
Because the interpunctions channel concerns the number of words interval Ip contained in the same number of sentences, the sermons delivered to different audiences have significantly different lengths of sentences, as we can notice in the average values of PF reported in Table 1. The “fine tuning” due to the linguistic channel describes more clearly the impact of this parameter.
Finally, Table 10 shows the symmetry index Is Equation (12) for all texts. Again, the most symmetrical channel is People ↔ Disciples; the least one is Jesus says, therefore confirming that the modern character Jesus speaks differently than the ancient Jesus.
8. Conclusions
We have applied the theory developed in [1] [2] [3] and recalled in Section 2, based on regression lines, to compare how a literary character speaks to different audiences by diversifying and adjusting two important linguistic communication channels, namely the “sentences channel” and the “interpunctions channel”. The theory can “measure”, how an author shapes a character speaking to different audiences by modulating mainly deep-language parameters.
To show the power of the theory, we have applied it to the great literary corpus written by an Italian mystic of the XX-century, Maria Valtorta. In this voluminous literary corpus, the character Jesus addresses different audiences: friends, disciples, people and delivers extempore or planned sermons to people.
Because the estimate of slope and the correlation coefficient of a regression line, on which the theory is based, depend on sample size, we have used a “renormalization” based on Monte Carlo simulations [3], and considered its results concerning the signal-to-noise ratio of channels as “experimental”.
The likeness index IL, ranging between 0 and 1, defined in [3], based on probability theory, allows to “measure” how a linguistic communication channel is similar to another channel, i.e. it measures how much a text can be “mistaken”, mathematically, with another text by studying self- and cross-channels and their signal-to-noise ratios.
Table 10. Symmetry index IS between the indicated texts, interpunctions channel. For example, in the channel Parables ↔ Hattin
. The most symmetrical channel is the channel Disciple ↔ People,
. The most asymmetrical channel is Jesus say ↔ Temple,
.
Although IL depends on both average value and standard deviation of the experimental signal-to-noise ratio
, a first assessment is given by the difference between
of the self-channel (usually the largest value) and
in a cross-channel (smaller value). As this difference increases, IL rapidly decreases. A 6-dB difference gives already
, which can be assumed as a threshold below which two texts depend very little on each other.
As discussed in [2] [3], asymmetry is typical of most linguistic channels. The symmetry index IS defined in the paper, ranges between 0 and 1. In very few channels,
, therefore indicates that the character Jesus addresses the two audiences as if they were indistinguishable. In most channels
, therefore indicates Jesus addresses the two audiences quite differently.
In conclusion, multiple linguistic channels can describe the “fine tuning” that a literary author can use to distinguish characters or the same character in different situations, as Maria Valtorta did. Of course, a similar approach can be used to study any literary corpus written in an alphabetical language.
Appendix A
In this Appendix we report the full data bank of the experimental (Monte Carlo) average and standard deviation of
in the indicated cross-channels, obtained after 5000 simulations in the sentences channels. The standard deviations are shown in parentheses. The average values and standard deviations of
and
refer to the calculated regression lines between the number of sentences in the indicated output text (dependent variable, column 1) and the number of sentences in the indicated input text in the Tablecaption. Correlation coefficients are reported with 4 decimal digits because some values differ only from the third digit (Tables A1-A7).
Table A1. Sentences channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Clear Water as output text. The standard deviations are shown in parentheses.
Table A2. Sentences channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Temple as output text. The standard deviations are shown in parentheses.
Table A3. Sentences channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Parables as output text. The standard deviations are shown in parentheses.
Table A4. Sentences channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Disciples as output text. The standard deviations are shown in parentheses.
Table A5. Sentences channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Synagogues as output text. The standard deviations are shown in parentheses.
Table A6. Sentences channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming People as output text. The standard deviations are shown in parentheses.
Table A7. Sentences channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Jesus says as output text. The standard deviations are shown in parentheses.
Appendix B
In this Appendix we report the full data bank of the experimental (Monte Carlo) average and standard deviation of
in the indicated cross-channels, obtained after 5000 simulations in the sentences channels. The standard deviations are shown in parentheses. The average values and standard deviations of
and
refer to the calculated regression lines between the number of sentences in the indicated output text (dependent variable, column 1) and the number of sentences in the indicated input text in the Tablecaption. Correlation coefficients are reported with 4 decimal digits because some values differ only from the third digit (Tables B1-B7).
Table B1. Interpunctions channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Clear Water as output text. The standard deviations are shown in parentheses.
Table B2. Interpunctions channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Temple as output text. The standard deviations are shown in parentheses.
Table B3. Interpunctions channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Parables as output text. The standard deviations are shown in parentheses.
Table B4. Interpunctions channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Disciples as output text. The standard deviations are shown in parentheses.
Table B5. Interpunctions channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Synagogues as output text. The standard deviations are shown in parentheses.
Table B6. Interpunctions channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming People as output text. The standard deviations are shown in parentheses.
Table B7. Interpunctions channel. Experimental (Monte Carlo) average and standard deviation of
(dB), average and standard deviation of slope
and correlation coefficient
of the texts indicated in column 1 (input) obtained by assuming Jesus says as output text. The standard deviations are shown in parentheses.