^{1}

^{*}

^{1}

^{*}

Improving learning outcome has always been an important motivating factor in educational inquiry. In a blended learning environment where e-learning and traditional face to face class tutoring are combined, there are opportunities to explore the role of technology in improving student’s grades. A student’s performance is impacted by many factors such as engagement, self-regulation, peer interaction, tutor’s experience and tutors’ time involvement with students. Furthermore, e-course design factors such as providing personalized learning are an urgent requirement for improved learning process. In this paper, an artificial neural network model is introduced as a type of supervised learning, meaning that the network is provided with example input parameters of learning and the desired optimized and correct output for that input. We also describe, by utilizing e-learning interactions and social analytics how to use artificial neural network to produce a converging mathematical model. Then students’ performance can be efficiently predicted and so the danger of failing in an enrolled e-course should be reduced.

Education is imperative for every nation for it improves the life of individuals by training them with skills and knowledge that let them cope with life challenges. Technological developments today influence every aspect of life including education. Technology provides speed and convenience for people and hence becomes a vital instrument in the educational process [

Educational theorists [

Inspired by advances in social networks analytics, the document analysis concept is carried out through the study of engagement in the e-learning during a semester identifying several variables that describe the student engagement. The network of co-occurrences between different variables, collected on a specific set, allows the quantitative study of the structure of contents, in terms of the nature and intensity level of correlations or interconnections. The sub-domains are placed using the structural equivalence techniques by grouping variables at different stages. A scientific field is characterized by a group of variables, which signify its concepts, operations and methodologies. The structure described by the frequency of co-occurrences of conceptual variables exposes the important relationships across these variables. These analyses of co-occurrences of variables give us the authority to comprehend the static and dynamic sides of the room in which we can relate and place their work in a hierarchy of scientific research concepts. This technique assorted as co-variable analysis, provides a direct quantitative manner of linking the conceptual contents.

According to Edelstein [

In the present research, a normalized co-variable matrix from 56 most-used categorized variables (features or predictors) is used to study the contribution to learning process. This matrix is split using its mean density to conclude the correlation matrix and build the network map. In order to achieve higher levels of computational capacity, an exceedingly complex structure of neural networks is required. We use a multilayer perceptron neural network which maps a set of input variables onto a set of output data. It consists of multiple layers of nodes in a directed graph at which each layer is connected to the next one. It is one of the most popular and successful techniques for many subjects such as content analysis, pattern recognition and document image analysis. It is also a potent technique to solve many real world problems such as predicting time-series in monetary world, identifying clusters of valuable customers, and diagnosing medical conditions and fraud detections, (see for instance [

The multilayer perceptron neural network has not been applied comprehensively, to the best of our knowledge, to e-learning optimization using supervised and unsupervised learning. The questions which arise then are whether the neural network technique is indeed appropriate to such problem, whether the architecture used to implement the technique reduces its effectiveness or complements it, and whether the technique produces a particular system that attaches to the problem.

The aim of regression methods is to provide a definite model which can be helpful in deriving a specific group that one of the database objects belongs to based on its features. One of the usual functions of regression methods includes determining future activities of student engagement so that the institute could alter the e-learning strategy [

Data mining is an automatic analysis technique in large data sets whose purpose is to extract unobserved correlations (or dependencies) between the data stored in data ware houses or other relational database schemes. The end-user may even not be aware of these data correlations and dependencies, although the knowledge derived from extracting them may turn out to be exceedingly beneficial. Data mining techniques [

The artificial neural network (ANN) is a parallel and iterative method made up of simple processing units called neutrons. While a multilayer neural network is a web of simple neurons called perceptron. The principle concept of a single perceptron was introduced by Frank Rosenblatt in 1958. A multilayer perceptron neural network (MLP) is a perceptron-type network which distinguishes itself from the single-layer network by having one or more intermediate layers. Backward propagation of errors (or simply back propagation), which has been used since the 1980s to adjust the weights, is a widespread process of training artificial neural networks. It is usually used in conjunction with an optimization method such as gradient descent. In an attempt to minimize the loss function; on each of training iteration, the current gradient of a loss function with respects to all the calculated weights in the network is evaluated and then the gradient is fed to the optimization method which employs it to update the weights. In this study we selected the standard Levenber-Marquardt Algorithm (LMA) [

In order to study the variables that contribute to the learning process and to the educational outcome we propose to consider specific variables during the semester of engagement relevant to students, to the peers, to the tutor and to the university administration. The 56 variables, see

Data is contextual, the sequences and the environment of the data has also to be accounted for. For example, an activity performed by a student who has a limited background is not the same as the one who has the entire necessary prerequisite. After data acquisition from the online activities performed by all participants, filtering has to be applied in order to remove irrelevant data. Once all the data are clean and relevant, a second stage of features extraction, clustering and classifications is applied in order to extract knowledge from the data. As mentioned above, there are many methods for features extraction and regression such as artificial neural network, decision tree, Markov model, Bayesian probability, principle component analysis, support vector machine, and re- gression analysis [

Class | Variable | Full name | Class | Variable | Full name |
---|---|---|---|---|---|

Student online engagement | Var. 1 | Login into the e-class. | Student content analysis | Var. 30 | Course keywords used. |

Var. 2 | Participate in the e-learning forums. | Var. 31 | Question context. | ||

Var. 3 | Starts a new thread in the forums. | Var. 32 | Answer context. | ||

Var. 4 | Reads a thread from a classmate. | Var. 33 | Comment context. | ||

Var. 5 | Votes on a post reply as “LIKE”. | Var. 34 | Disputation comment. | ||

Var. 6 | Votes on a post reply as “DISLIKE”. | Var. 35 | Cumulative comment. | ||

Var. 7 | Receivesa “LIKE” reply. | Var. 36 | Exploratory comment. | ||

Var. 8 | Receives a “DISLIKE” reply. | Var. 37 | Effective interaction. | ||

Var. 9 | Starts a thread that a classmate votes up. | Var. 38 | Ineffective interaction. | ||

Var. 10 | Starts a thread that a classmate votes down. | Tutors online engagement | Var. 39 | Answer a question. | |

Var. 11 | Enters the online quizzes. | Var. 40 | Give a comment. | ||

Var. 12 | Solves the online questions. | Var. 41 | Start a thread. | ||

Var. 13 | Answers the online question at first instance. | E-course design | Var. 42 | Number of questions provided. | |

Student self-regulation | Var. 14 | Enjoyment. | Var. 43 | Number of hours per week required from a student. | |

Var. 15 | Anxiety. | Var. 44 | Personalized (relevant to student background). | ||

Var. 16 | Boredom. | University support | Var. 45 | System downtime. | |

Var. 17 | Hopelessness. | Var. 46 | Low participated student identification. | ||

Var. 18 | Self-efficacy. | Var. 47 | High participated student identification. | ||

Var. 19 | Effort. | Var. 48 | System bandwidth. | ||

Var. 20 | Ambition. | Var. 49 | Emails send to the student. | ||

Var. 21 | Goal oriented. | Var. 50 | Emails send to the tutor. | ||

Var. 22 | Self-organized. | Student information | Var. 51 | Year of admission. | |

Student background | Var. 23 | GPA before the e-course. | Var. 52 | Semester of admission. | |

Var. 24 | Average grades of prerequisite courses for the e-course. | Var. 53 | Total credit hours. | ||

Var. 25 | High School GPA. | Var. 54 | Number of warnings. | ||

Var. 26 | Family education. | Var. 55 | Student status. | ||

Var. 27 | Number of family members. | Var. 56 | Over all GPA. | ||

Var. 28 | Family income. | ||||

Var. 29 | Residency location. |

algorithm that will investigate their impact on the educational outcome.

The identified values of the variables will be used to train the neural network. For more information about the variables the reader is referred to [

Multilayer neural network is a particular type of network which consists of a group of sensory units. These units are observed as cascading layers; an input layer, one (or more) intermediate-hidden layers and an output layer of neurons. The neural network is completely connected such that all neurons of each layer are connected to all neurons in the preceding layer. At the beginning of the backward propagation process we should consider how many hidden layers are required. The computational complexity can be seen by the number of single-layer networks combined into this multilayer network. In this multilayer structure, the input nodes pass the information into the units in the first hidden layer then the outputs from the first hidden layer are passed into the next layer, and so on. It is worth noting that the network is a supervised learning, i.e., both the inputs and the outputs should be provided. The network processes the inputs and compares its resulting outputs against the desired corresponding outputs. Errors are then calculated, giving rise to the system to regularize the weights and control the network. This process takes place over and over as the weights are continually adjusted.

The back propagation algorithm of multilayer neural network is summarized in forward and backward stages. In the forward stage, the signal that transfers out of the network (through the network layers) is calculated as follows:

where

The error at the output layer is

It represents the difference between the target output for an input pattern and the network response. It is used to calculate the errors at the intermediate layers. This is done sequentially until the error at the very first hidden layer is computed. After computing the error for each unit, whether it is at a hidden unit or at an output unit, the network then fine-tunes its connection weights by performing the backward. The general idea is to use the gradient descent to update the weights so that the square error between network output values and the target output values are reduced. The backward can be performed as follows:

where

The learning rate which is a typically a small value between 0 and 1 controls the size of weight modulations. Here the derivative

The major problem in training a neural network is deciding when to stop the training. The algorithm brings to an end when the network reaches minimum errors which can be calculated by the mean square errors (MSE) between the network output values and the desired output values. The number of training and testing iterations can also be used as stopping criteria.

Training the neural network to emerge the right output for a given input is a computational iterative procedure. The evaluated root mean square error of the neural network output (on each training iteration) and the way which the error changes with training iteration are utilized to determine the convergence of the training. The challenge is to determine which indicators and input data could be practiced, and to amass enough training data to improve the network appropriately. Many factors interact with each other to generate the observed data. These factors are organized into multiple layers, representing multiple abstractions, weights and biases. By using various numbers of layers and neurons, different levels of abstractions spawn with different features. It is possible to train a neural network to perform a particular function by adjusting the weights of the connections between the neurons. Errors are propagated backward through the network to control weight adjustments. Network layers are trained when errors fall below a threshold.

The process of training the neural network is summarized as follows: input data is continuously applied, actual outputs are calculated, and weights are adjusted such that the application of inputs produce desired outputs (as close as possible). Weights should converge to some value after many rounds of training and differences between desired and actual outputs are minimized.

In our experiment, we observed 1879 students (in one semester) using student information criteria, mentioned

The first run of the algorithm using 50 hidden neurons produced the result in

Samples | MSE | R | |
---|---|---|---|

Training | 1315 | 1.81246e−1 | 9.20899e−1 |

Validation | 282 | 3.33569e−1 | 8.64064e−1 |

Testing | 282 | 2.21908e−1 | 9.09618e−1 |

generate different results due to different initial conditions and sampling.

After iteration 12, the gradient decent calculated by back propagation algorithm was not increasing and in a further 6 validation iterations, hence stopped at epoch 18 with value 0.035909 (

further as well as Mu values. The neural network became ready and trained to perform the desired function which is to predict the Grade Point Average (GPA) for future students, provided predictors are available.

Neural networks learn from examples and capture functional relationships between variables in the data even if the underlying relationships are nonlinear or unknown. Even though neural networks are not perspicuous in their expectation, they can outperform all other methods of association, classifications, clusters and prediction of supervised and unsupervised learning as proved with their high performance prediction for non-linear systems. Furthermore, the training algorithm may change depending on the neural network structure, unless the nearly common training algorithm used when designing networks is the back-propagation algorithm. The major problem in training a neural network is deciding when to finish operations as well as the overtraining phenomena which occur when the system memorizes patterns and thus lacks the power to extrapolate.

Due to our research constraint in our experiment, we selected only subset predictors in our training algorithm for students GPA; however, in future research it can be extended to all other variables that have not been selected and hence improve the performance outcome. The nature and the causes of the correlations between the predictors have to be explored. Furthermore, there are opportunities to experiment with other learning algorithms and contrast with neural network.