Non-Native English Speaker Readability Metric : Reading Speed and Comprehension

This paper presents an investigation to evaluate the reading speed and reading comprehension of non-native English speaking students by presenting a simple analytical model. For this purpose, various readability softwares were used to estimate the average grade level of the given texts. The relationship between the score obtained by the students and their reading speed under average grade level 9 and 14 using font size 12 and 14 is presented. The experimental results show that the reading speed and the score versus the students may be explained by a linear regression. Reading speed decreases as the score decreases. The students with a higher magnitude of reading speed scored better marks. More importantly, we find that the reading speed of our students is lower than the native English speakers. This approach of modeling the readability in linear form significantly simplifies the readability analysis.


Introduction
Language is an essential tool to convert our ideas into words.Language is a form of communication which has two important parts: 1) The expressions or experiences which we want to express or share and 2) the suitable words that we use to convey these ideas or experiences.English as a Lingua Franca is an existing common language used for communication between speakers of different languages.When we learn English language we aim to develop all four major skills of the language: Listening, Speaking, Reading and Writing.
All four skills are either receptive or transmittal skills.
Reading is an active and diligent process.To establish a successful and productive communication between a reader and a writer, the prerequisite is the clarity of thoughts and correct usage of the words in the text.The relevance of writing is unattainable unless the reader understands and comprehends the text in its actual context.Reading involves the understanding of words, and their meanings to have an accurate interpretation of the text.When reader reads a text he or she should be able to connect the words with the given situation for better understanding.
Readability is defined as the ease with which a written text can be understood by a reader.The readability of a particular text depends on both: The complexity of its vocabulary and syntax.
To measure readability, various computer-based formulas have been proposed that include only two factors [1]- [13]: 1) the number of syllables or (letters) in a word and 2) the number of words in a sentence.The results generated by these various formulas are not accurate and not always in a good agreement.
In the present paper, we will examine the reading speed V RS and reading comprehension text shown as a black text color on a plain background for nonnative English speaking undergraduate students of our institution.Different presentations of text with different font sizes were assigned to the students.The obtained results are compared with those of similar investigations.Correlation between performance goals such as success rate, time to complete tasks versus students will be presented.

Methodology, Experimental Result & Discussion
In order to check the readability a planned survey was conducted in two phases.
In the first phase, a drafted text was chosen with a few direct questions to evaluate the students' reading speed.The difficulty level of the text was checked thoroughly using readability software [1]- [11] before it was given to the students.The students were asked to read the text carefully for a few minutes and write the answer in the space provided after each paragraph.The time they took to read the text was noted by the instructor.The purpose of this investigation was just to see whether the students had understood the text by considering the time it took to complete the tasks.In the second phase, after monitoring the reading speed, a reading comprehension was given to the students.An elaborated and structured text having variety of questions as multiple choice questions was given to the students.The purpose of this survey was to check the understanding of the students.The total time taken by the students to complete the text was also noted by the instructor.
The sample subjects of this study are 70 students both male and female, 60% male and 40% female.In term of language proficiency, we used a placement test with average level 9 and 14, calculated using the readability software indicated in Table 1 [1]- [11].The selected subjects were employed and non-employed students is best understood by university student.Based on the USA education system, a grade level is equivalent to the number of years of education a person has had.
Scores over 22 should generally be taken to mean graduate level text.
The reading speed text contained 516 words.The subjects were allowed to do the test after they confirmed that they had understood about how to answer the questions.They were also told to be a little fast because of the time constraint.
Presenting text in darker colors and larger size with more pixel aids helped students in performing better.Black text on a plain background has been found to yield faster reading than black text on a medium textured background [14], up to 32 percent faster than reading light text on a dark background [14].It seems that larger text with more pixel draws more attention than smaller ones.
In this investigation, for rapid reading and understanding we present black text on white page with high contrast backgrounds.The font size is either 12 or 14 points.
Long sentences using unnecessary words are often used to express more than one idea in a sentence.Research indicates that brief and simple sentences are easily and readily understood than long sentences.Sentences over 20 words in length cause a loss in reading comprehension [15].In addition to adequate contrast between text and its background, it is also recommended that the number of sentences in a paragraph should not exceed six as indicated in Table 1          and varies between 0 and 1.
In Figure 3 and Figure 4, after collecting the data, the sample for V RS and students' score are sorted in ascending order.Then, the sorted data of V RS and score (y-axis) versus the corresponding students (x-axis) are displayed as a scatter plot.The x-axis coordinates x 1 corresponds to the first student's name point where x n corresponds to the n th student's name point.
The variation in V RS using font size 12 associated with the student scores versus student names are illustrated in Figure 3 and Figure 4, for two student groups, group1 and group 2, respectively.The experimental results show that the students with a higher magnitude of V RS scored better marks.On the other hand, lower mark is observed when the V RS decreases by approximately from 120 to 70  For reading comprehension under different size font 12 and 14, the relationship between the students and their scores is illustrated in Figure 7 and Figure 8 for the group 1 and 2, respectively.This correlation can be described by linear relationship indicated by the equations.4.1 and 4.2 for group 1 and 2, respectively.Note that the x-axis is the name of the student and not the number of the student, and the graph was obtained after sorting the data from the lowest to the highest In this investigation, the subjects were told that there was no time limitation and they could do this part at their own pace, but they were not supposed to take a lot of time.The value of the coefficient of determination under this condition is larger than 0.84, which indicates very strong correlation.
0.0544 1.1492 For the same group, the score obtained under font 12 and 14 can also be described by linear relationship.The average slopes for group 1 (a = −0.0544,Fig- ure 7) and for group 2 (a = −0.01, Figure 8) are different.The minimum value scored for group 1, is about 30%, while for the group 2 the minimum value is 87%.
The survey indicated that more than 90% of the students do not read.In comparison with the native speaker, it seems that non English speaking students do not spend more time in reading.This could be explained by many factors such as, difficulty in reading, lack of motivation, environment, etc.
The coefficient of determination gives an indication of the contribution of the factor being studied in the regression analysis to the relationship between reading speed, score and student.In the case of reading speed data (Figure 3 and Figure 4) and score data (Figure 7, Figure 8), the value of the coefficient of determination, R 2 is larger than 0.77.This indicates a very strong correlation between the student and their performance.On other hand, the coefficient of determination, the score obtained by the student under controlled time shows very weak correlation (Figure 3 and Figure 4) with R 2 magnitude less than 0.2.This indicates that that the time is an important factor changing the magnitude of the coefficient of determination R 2 from 0.2 (Figure 3, Figure 4 score data) to more than 0.84 (Figure 7, Figure 8).
It is extremely important for a reader to get enough motivation to read and then using his skills to comprehend and understand the text.Motivation can be instilled through consistent encouragement by the educator and parents.But interest can only be developed through regular reading.The complexity of the text affects the reader's motivation.In order to develop interest, it is essential that text should not be very complex to understand.If the text has difficult vocabulary and complex syntax then the reader loses interest in the very first phase of reading.
Reading skill, once developed, can be most easily maintained at a high level by the students.If the students have poor receptive skills then they will find reading monotonous.As a result they will never interpret the text in the right context.
They will face multiple problems to comprehend and understand English especially in their academic courses where English Language is most commonly used.Due to lack of student's proficiency in the language, to bridge the language gap, it requires a lot of collaborative efforts and coordination from the student and the teacher to overcome the difficulties in the process of learning.Higher complexity requires more learning and results in less efficient human performance.
In addition to the effect of the environment, the difference of reading speed and performance may be explained by individual differences and basic skills.We all differ in learning abilities and in task completion [16].
Knowledge, experience, environment, and familiarity will help to remember and increase the readability speed.For example, it seems that most native English-speaking people remember English words easily than non-native speaker.
Reading in one's native language is easier than reading something in a foreign language.A non-native speaker who uses English or learns English as a second language faces many difficulties in the process of learning English.Reading doesn't come easily.To enjoy reading and achieve proficiency, interest should be created in reading independently without facing any difficulties.Reading should be developed as a habit so that the reader enjoys what he reads.
Learning is the process of encoding in long-term memory information that is contained in short-term memory.It is a complex process that requires consistent efforts.Learning process is improved through repetition and deep analysis, only if the information being transferred from short-term memory, has structure and is meaningful and familiar.Based on above learning processes, it can be ascertained that high readability requires high skill.In case of sample students due to lack of adequate reading, their process of encoding is likely to be reduced which results in less efficient performance indicated by the slow reading speed and low score due to the low degree of familiarity and deep analysis.
The essence of skill is in the performance of actions characterized by consistency and economy of effort.Given enough time, people can improve their performance in almost any task.Usability goals versus performance in the form of measurable objectives must also be established.Performance goals such as the time it takes to complete tasks versus success rates must be defined.In performance, research indicates that a greater working memory is positively related to increase reading comprehension, reasoning skill, and learning technical information [17].In addition, information stored within working memory is variously thought to last from 5 to 30 seconds, with estimates of working memory storage capacity size is about of 3 to 4 items [14].Based on the above theories, it seems that low reading speed can be explained by the minimum information stored time with low capacity storage.
Lind et al., reported that there are two levels of information processing [18].
Both levels function simultaneously.The highest level used for reading and understanding, which consists of consciousness and working memory, performing reasoning and problem solving.This level is limited, slow, and sequential.In contrast to the lowest level, perceiving the physical form of information sensed, it processes familiar information rapidly.It seems that the process of both levels of our students is likely to be reduced.

Conclusion
Through our study we made an attempt to investigate the degree of readability of non-native English speakers.The study demonstrates that speed of reading, which is an indicator of assessing a non-native English speaker's readability, is lower than that of a native English speaker.More importantly, the results show that the relationship between the reading speed and score versus the reader can be described by linear regression with very strong correlation.The magnitude of extracted parameters namely the slope and y-intercept maybe used as guideline to asses and evaluate the readability.In our future work, to clarify the weak correlation obtained in the present investigation, experiments will be carried under separated gender, controlled time and different grade level.

Figure 1 .
Figure 1.Reading speed text.X = Average number of syllables per 100 words: 140.Y = Average number of sentences per 100 words: 5.1.G = 8.9 = Grade level (the number in the white circle between the dark blue parallel lines is the grade level).

Figure 2 .
The score is the number of items the subject completes accurately within the time limit indicated in Figures3-8.Data are represented using a plot called a scatter plot or x − y scatter diagram plot.During analysis we try to find the equation of a line that fits the data.This is called the regression line.Points are (x = student, y =reading speed or score) pairs can be plotted on the Cartesian coordinate system.From the study of correlation when the slope of the regression line is positive the value of y increases

Figure 2 .
Figure 2. Reading speed text.X = Average number words with 6+ characters per 100 words: 22. Y = Average number of sentences per 100 words: 5.1.G = 7.9 = Grade level (the number between the pink parallel lines is the grade level).

[ 2 ) 2 (
words/min].More importantly, each group shows a linear relationship between V RS or score versus the reader.The average slopes obtained for the group 1 and 2 are given in Equation (1.1), (1.2) and (2.1), (2.2), respectively with a slight difference (a rs1 = −0.015, a score1 = −3.6759)and (a rs2 = −0.088, a score2 = −1.87).The test for the reading speed under the average level 9, shows approximately the same extracted parameters for group 1 (a rs1 = −0.015,b rs1 = 1.0034) and group 2 (a rs2 = −0.0088,b rs2 = 0.8427), respectively.Based on these extracted results, it seems that these groups perform quite similarly.On other hand, with higher average grade level 14, for the text comprehension, the extracted parameters for group 1 (a score1 = −3.6759,b score1 = 104.82)and for group 2 (a score2 = −1.87,b score1 = 78.911)show that the readability performance is quite different.Note that the x-axis is the name of the student and not the number of the student, and the graph was obtained after sorting the data from the smallest to the highest score.Linear regression for reading speed and score obtained by the students for group 1 (Figure3) are given in Equations (1.1) and (1.2), respectively.Linear regression for reading speed and score versus the students for group Figure4), are given in Equations (2.1) and (2.2), respectively.is a statistical measure of how close the data are to the fitted regression line.The reading speed data indicates a very strong positive correlation (R 2 > 0.77) between V RS and student.While, the score graph versus the student shows very weak correlation in the range of

Figure 5 and
Figure 5 and Figure 6 show the score obtained by the student versus V RS using font size text 12.When V RS changes approximately from 30 to 130 [words/ min], the score increases linearly by approximately from 70% to 120%.The relationship between V RS and its corresponding score may be described by linear regression given by the Equations (3.1) and (3.2), respectively.With an average slope and-intercept (a SS1 = 0.0038, b ss1 = 0.5931) and (a SS2 = 0.0039, b ss2 = 0.59097) for the group 1 and 2, respectively.The two samples appear radically the same.Linear regression score versus reading speed for the group 1 (Figure5) and group 2 (Figure6) are given in Equations (3.1) and (3.2), respectively.The value of the coefficient of determination R 2 is less than 0.2.This indicates very weak correlation.