Knowledge Tracking Model Based on Learning Process

Knowledge tracking model has been a research hotspot in the field of educational data mining for a long time. Knowledge tracking can automatically discover students’ weak knowledge points, which helps to improve students’ self-motivation in learning and realize personalized guidance. The existing KT model has some shortcomings, such as the limitation of the calculation of knowledge growth, and the imperfect forgetting mechanism of the model. To this end, we proposed a new knowledge tracking model based on learning process (LPKT), LPKT applies the idea of Memory Augmented Neural Net-work(MANN).When we model the learning process of students, two additional important factors are considered. One is to consider the current state of knowledge of the students when updating the dynamic matrix of the neural network, and the other is to improve the forgetting mechanism of the model. In this paper we verified the effectiveness and superiority of LPKT through comparative experiments, and proved that the model can improve the effect of knowledge tracking and make the process of deep knowledge tracking eas-ier to understand.

guidance [2] are important research topics in the field of intelligent education at present, and also the basic trend of future education development. The characteristics of knowledge tracking are personalization and automation [3]. Its task is to automatically track the change process of students' knowledge status with time according to the interaction between students and intelligent tutoring system, so as to accurately predict students' mastery of knowledge points and their next presentation. KT task can be formalized as a supervised sequential learning problem: given the historical interaction sequence of students, the probability of students' correct answer to the next exercise can be predicted. The typical knowledge tracking methods are: Bayesian Knowledge Tracking (BKT) [4] and Deep Knowledge Tracking (DKT) [5]. The initial knowledge tracking is based on probability Bayesian knowledge tracking model (BKT), In essence, it is a special case of Hidden Markov Model. It divides the knowledge system into multiple knowledge points, and the students' mastery of each knowledge point is regarded as an implicit state, then the probability distribution of hidden variables is updated according to the students' historical answers. However, the BKT model has the following shortcomings [6]: firstly, it needs to label the data. Secondly, the concept of each knowledge point is expressed separately. BKT cannot capture the correlation between different concepts and cannot effectively represent the complex concept state transition. With the development of deep learning, researchers apply deep learning to the field of knowledge tracking, and propose a deep knowledge tracking model (DKT), which uses Long-Short Term Memory Network (LSTM) [7] for knowledge tracking task. It not only has better prediction performance than BKT model, but also does not need experts to annotate the knowledge points of exercises. However, the DKT model represents the students' mastery of knowledge points with a hidden state, and the hidden state cannot be explained. Therefore, the DKT model cannot output the students' mastery level of each knowledge point in detail [8]. Moreover, LSTM stores all memories in a hidden vector, which makes it difficult for LSTM to accurately remember sequences with more than hundreds of time steps. Memory Augmented Neural Network (MANN) [9] is a solution to the above problems: it allows the network to retain multiple hidden state vectors, which are read and written separately. The representative models of MANN include end-to-end memory network [10], dynamic memory network [11], etc. The latest development in the field of knowledge tracking is the Dynamic Key-Value Memory Network (DKVMN) proposed in 2017 [12]. It draws on the ideas of MANN and combines the advantages of BKT and DKT, and achieves better prediction performance. In addition, DKVMN has several other advantages over LSTM, including avoiding over fitting, fewer parameters, and automatically discovering similar exercises through potential concepts.

Deficiency of Dynamic Key Value Memory Network
Although DKVMN has made a breakthrough in the field of knowledge tracking, First of all, there are limitations in the calculation of knowledge growth. In DKVMN, the amount of knowledge growth is calculated by multiplying the learning activity of students' answering questions and a trained embedded matrix, which means that the knowledge growth gained by students after each question answering activity is only related to this activity. However, in fact, from the perspective of human cognitive process, students' knowledge growth in learning should also be related to students' current knowledge status [13]. For the same question answering learning activity, the amount of knowledge growth gained by students with a certain foundation and students who have no contact with this knowledge point is different [14].
Secondly, it relies too much on the forgetting mechanism of the model itself.
In DKVMN, the updating process of students' knowledge state is inspired by LSTM forgetting mechanism. Firstly, the "erase" vector is calculated by the hidden layer of a sigmoid activation function, and then the "erase" vector and the student's knowledge growth are used to update the dynamic knowledge state matrix of students [15]. However, according to the research on the forgetting process of human beings by the famous German psychologist Ebbinghaus, the forgetting process of learning is also affected by the current knowledge state of students [16]. In fact, the amount of knowledge forgotten by students who have just started learning should be greater than that after they have learned for a period of time.
Finally, the forgetting mechanism is not considered in the prediction process.
DKVMN uses the advantages of large capacity memory of MANN to model the learning process of students. Originally, MANN is widely used in intelligent question answering and machine translation. It stores the learned knowledge in the dynamic matrix by reading a large number of documents. Therefore, the process of intelligent question answering and machine translation is similar to the process of retrieval. However, in the field of knowledge tracking, the prediction process of predicting whether students can correctly answer the next question is not a simple retrieval process. It also needs to consider the students' memory forgetting in learning, which is obviously not considered in DKVMN.
According to the above, we can see that the superiority of the current deep knowledge tracking method is attributed to the deep learning model itself. In essence, to achieve a better knowledge tracking effect, we need to start from human learning cognitive psychology and complete the knowledge tracking process by simulating students' learning and memory process. Therefore, we propose a knowledge tracking model based on learning process (LPKT), it adopts the idea of Memory Augmented Neural Network to model the learning process of students. We make two improvements: one is to consider the current knowledge state of students when updating the dynamic matrix of MANN, the

Knowledge Tracking Model based on Learning Process
The knowledge tracking model based on learning process (LPKT) aims to complete knowledge tracking by simulating students' learning and memory process.
Its structure is shown in Figure 1.

Attention Mechanism
The attention mechanism in MANN can be understood as finding the knowledge points involved according to the students' questions in the answering activities. In the answering learning activities, attention will be used in the reading and writing process of MANN. The calculation process of attention includes: firstly, according to the problem encountered by students at t moment, multiply it with a trained embedded matrix A to obtain vector t k , and then process t k through static matrix K to obtain attention vector t w . The calculation process is as follows:

Reading Process
The reading process is the prediction process of knowledge tracking. Firstly, according to the attention vector, the students' mastery of the knowledge points involved in the problem is read from the knowledge state matrix of the students.
In the DKVMN, this process is calculated as follows: However, considering the forgetting mechanism in the learning process, we have to carry out two additional steps. First, we calculate the amount of knowledge forgetting of the student according to his knowledge state t V : Then, referring to the forgetting mechanism of LSTM, according to the forgetting vector, attention vector and input vector, the knowledge state ' t V in accordance with the learning law of students is calculated: So we can modify the formula to: Then, the knowledge state vector t r and the input vector t k are processed by multi-layer perceptron to get vector t f , which reflects the students' knowledge state and the characteristics of the problem itself, such as the difficulty of the problem, and shows the comprehensive knowledge state of the students for a specific problem: Finally, the vector a is passed through the sigmoid output layer: Then we can get the probability that students can correctly answer the question. So far, the reading process of knowledge tracking method based on learning and memory process has been completed.

Writing Process
The process of writing in MANN is to update students' dynamic knowledge state in knowledge tracking. Firstly, according to the model mechanism of MANN, a question answering activity ( , ) t t t x q a = is multiplied by another embedded matrix B to obtain vector t v , and t v represents the knowledge increment gained by students. Because Ha points out that the knowledge increment of this dependent model is not enough to express the students' gains in the learning process, and proposes that the knowledge state of students should be considered when calculating the knowledge increment of students, so the knowledge increment of students is expressed as ' After we get the increase of students' knowledge, we update the dynamic matrix V in MANN with the method similar to the "forgetting gate" mechanism in LSTM, which is called "erase" in DKVMN. Generally, the calculation process of "erase" vector to determine the number of forgetting is as follows: However, from the formula, we can draw a conclusion that for the same student, as long as the amount of knowledge growth is the same, the "erase" vector is also the same, which is obviously contrary to common sense. Moreover, Ha points out that DKVMN model, as a method of calculating forgetting vector, will lead to too much forgetting content. Although Ha gives a regularization method to modify it, however, this correction method is not very explanatory.
According to human cognitive process [17] and the forgetting theory of learning, the forgetting curve of students' memory in the learning process is not only related to the current knowledge increment, but also to the current learning duration of students. In the knowledge tracking model of this paper, the correlation between current and learning duration can be understood as the correlation with the current knowledge state of students. Therefore, combined with the forgetting vector based on the knowledge state of students in formula (3)  When the value in the dynamic matrix V is "erased", we calculate the update vector according to the knowledge growth vector, and the calculation process is similar to that of LSTM: Finally, through the process of "erasing" and then updating, the updating process of students' dynamic knowledge state value is as follows: That is to say, after the students' answering behavior at time t, the value of dynamic matrix is transformed from The optimization goal of our model is to minimize the difference between the predicted value and the actual value of the students' answer results, that is, to minimize the cross entropy of t p and t a . So our loss function is: And we use the random gradient descent method for training.

Dataset
We verify the effectiveness of our method on the data sets of ASSIST Ments 2009 [18] and ASSIST Ments 2015 [19]. These two datasets are from the ASSIST Ments online education platform, reflecting students' learning activities on the platform. Xiong once pointed out that there are problems such as duplicate data in dataset ASSIST Ments 2009, so a new version of dataset "skill builder_ data_ corrected" has been officially released. However, the skill name attribute of this version of data is still empty. In addition, we also found that the sparse learning records of some students are not helpful to our learning process. Therefore, we screened the data and got the statistical results shown in Table 1. A total of 315,527 records were obtained, representing 3091 students' answers to 110 different questions. Similarly, in the ASSIST Ments 2015 data set, we collected 628,507 records, which recorded 14,228 students' answers to 100 questions. In addition, in order to compare with DKVMN, the setting of memory  Table 2.

Experimental Design
We divide the data set into training set, cross validation set and test set. The training set accounts for 60% of the data set, and the cross validation set and test set account for 20% respectively. We use cross entropy as the loss function, use SGD optimization algorithm to train, and set the learning rate to 0.005. We use the area under receiver operating characteristic (ROC) curve to measure the performance of the model. Firstly, as a performance measure, AUC has been widely concerned in the field of machine learning, especially for class imbalance problems. In addition, most of the papers in KT field use AUC as the evaluation index, so our experimental results can be easily compared with those in KT field.

Experimental Results and Analysis
Our experimental results are shown in Table 3.  In addition, we also analyze the results of training set and verification set of DKVMN and LPKT in the training process. As shown in Figure 3, in the two data sets, the AUC value difference between DKVMN and LPKT is not very  large. This means that although the LPKT method increases the parameter scale of MANN, it does not cause over fitting phenomenon to the model, that is, the increase of parameters is scientific and reasonable.

Conclusions
Based on the cognitive learning process of human, we proposed a knowledge tracking model based on learning process by improving the forgetting mechanism and knowledge growth mechanism of the existing knowledge tracking model, which makes the knowledge tracking more consistent with the human learning process and enhances the interpretability of the model.
In this paper, the LPKT model is compared with DKVMN model and DKT model on two datasets. The experimental results show that the AUC score of LPKT is significantly higher than that of DKVMN and DKT on the two datasets, and there is no over fitting phenomenon on the premise of increasing the parameter scale. This fully proves the effectiveness and superiority of our model.
LPKT can be applied to a variety of online education platforms to help educators to achieve personalized guidance.