Open Journal of Statistics

Volume 9, Issue 3 (June 2019)

ISSN Print: 2161-718X   ISSN Online: 2161-7198

Google-based Impact Factor: 0.53  Citations  

Deep Language Statistics of Italian throughout Seven Centuries of Literature and Empirical Connections with Miller’s 7 ∓ 2 Law and Short-Term Memory

HTML  XML Download Download as PDF (Size: 6301KB)  PP. 373-406  
DOI: 10.4236/ojs.2019.93026    749 Downloads   2,008 Views  Citations
Author(s)

ABSTRACT

Statistics of languages are usually calculated by counting characters, words, sentences, word rankings. Some of these random variables are also the main “ingredients” of classical readability formulae. Revisiting the readability formula of Italian, known as GULPEASE, shows that of the two terms that determine the readability index Gthe semantic index , proportional to the number of characters per word, and the syntactic index GF, proportional to the reciprocal of the number of words per sentenceGF is dominant because GC is, in practice, constant for any author throughout seven centuries of Italian Literature. Each author can modulate the length of sentences more freely than he can do with the length of words, and in different ways from author to author. For any author, any couple of text variables can be modelled by a linear relationship y = mx, but with different slope m from author to author, except for the relationship between characters and words, which is unique for all. The most important relationship found in the paper is that between the short-term memory capacity, described by Miller’s “7 ? 2 law” (i.e., the number of “chunks” that an average person can hold in the short-term memory ranges from 5 to 9), and the word interval, a new random variable defined as the average number of words between two successive punctuation marks. The word interval can be converted into a time interval through the average reading speed. The word interval spreads in the same range as Miller’s law, and the time interval is spread in the same range of short-term memory response times. The connection between the word interval (and time interval) and short-term memory appears, at least empirically, justified and natural, however, to be further investigated. Technical and scientific writings (papers, essays, etc.) ask more to their readers because words are on the average longer, the readability index G is lower, word and time intervals are longer. Future work done on ancient languages, such as the classical Greek and Latin Literatures (or modern languages Literatures), could bring us an insight into the short-term memory required to their well-educated ancient readers.

Share and Cite:

Matricciani, E. (2019) Deep Language Statistics of Italian throughout Seven Centuries of Literature and Empirical Connections with Miller’s 7 ∓ 2 Law and Short-Term Memory. Open Journal of Statistics, 9, 373-406. doi: 10.4236/ojs.2019.93026.

Cited by

[1] Capacity of Linguistic Communication Channels in Literary Texts: Application to Charles Dickens' Novels
Information, 2023
[2] Chronology of Jesus' and John the Baptist'Births, and Jesus' Epiphany and Death in Maria Valtorta's Writings
Open Journal of Social Sciences, 2023
[3] Linguistic mathematical relationships saved or lost in translating texts: Extension of the statistical theory of translation and its application to the new testament
Information, 2022
[4] Multiple Communication Channels in Literary Texts
Open Journal of Statistics, 2022
[5] Hidden and coherent chronology of Jesus' life in the literary work of Maria Valtorta
SCIREA Journal of …, 2021
[6] Criteri per determinare la misurazione del livello linguistico e di difficoltà di una produzione scritta
2020
[7] Jesus Christ's Speeches in Maria Valtorta's Mystical Writings: Setting, Topics, Duration and Deep-Language Mathematical Analysis
2020
[8] A Statistical Theory of Language Translation Based on Communication Theory
2020

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.