_{1}

^{*}

User model which is the representation of information about user is the heart of adaptive systems. It helps adaptive systems to perform adaptation tasks. There are two kinds of adaptations: 1) Individual adaptation regarding to each user; 2) Group adaptation focusing on group of users. To support group adaptation, the basic problem which needs to be solved is how to create user groups. This relates to clustering techniques so as to cluster user models because a group is considered as a cluster of similar user models. In this paper we discuss two clustering algorithms: *k*-means and *k*-medoids and also propose dissimilarity measures and similarity measures which are applied into different structures (forms) of user models like vector, overlay, and Bayesian network.

User model is the representation of personal traits (or characteristics) about user, such as demographic informa- tion, knowledge, learning style, goal, interest, etc. Learner model is defined as user model in learning context in which user is a learner who profits from (adaptive) learning system. We note that learner model, student model, and user model are the same terms in learning context. Adaptive systems exploit valuable information in user model so as to provide adaptation effects, i.e., to behave different users in different ways. For example, the adaptive systems tune learning materials to a user in order to provide the best materials to her/him. The usual adaptation effect is to give individually adaptation to each user, but there is a demand to provide adaptation to a group or community of users. Consequently, all users in the same group will profit from the same learning materials, teaching methods, etc. because they have the common characteristics. So there are two kinds of adaptations:

1) Individual adaptation regarding to each user;

2) Community (or group) adaptation focusing on a community (or group) of users.

Group adaptation has more advantages than individual adaptation in some situations:

1) Common features in a group which are the common information of all members in such group are relatively stable, so it is easy for adaptive systems to perform accurately adaptive tasks.

2) If it is a new user logins system, she/he will be classified into a group and initial information of his model is assigned to common features in such group.

3) In the collaborative learning, users need to learn or discuss together. It is very useful if the collaborative learning is restricted in a group of similar users. Therefore, it is convenient for users that have common characteristics (knowledge, goal, interest, etc.) to learn together because they don’t come up against an obstacle when interacting together.

The problem that needs to be solved now is how to determine user groups. This relates to clustering techniques so as to cluster user models because a group is considered as a cluster of similar user models. Sections 2 and 3 discuss about user modeling clustering technique, namely k-means algorithm for vector model, overlay model, Bayesian network. In Section 3, we propose formulations so as to compute the dissimilarity of two overlay models (or two Bayesian network). The k-medoids algorithm and similarity measures such as cosine similarity measure and correlation coefficient are discussed in Section 4. Section 5 is the conclusion.

Suppose user model U_{i} is represented as vector _{i} represents user knowledge then U_{i} is considered as a knowledge vector in which the j^{th}^{ }component of this vector is the conditional probability expresing how user master knowledge item j^{th}. Note that U_{i} is called user model or user vector or user model vector.

Suppose there is a collection of users

The less dissim(U_{1}, U_{2}) is, the more similar U_{1} and U_{2} are. Applying k-means algorithm [

1) It randomly selects k user models, each of which initially represents a cluster mean. Of course, we have k cluster means. Each mean is considered as the “representative” of one cluster. There are k clusters.

2) For each remaining user model, the dissimilarities between it and k cluster means are computed. Such user model belongs to the cluster which it is most similar to; it means that if user model U_{i} belong to cluster C_{j}, the dissimilarity measure dissim(U_{i}, C_{j}) is minimal.

3) After that, the means of all clusters are re-computed. If stopping condition is met then algorithm is terminated, otherwise returning step 2.

This process is repeated until the stopping condition is met. For example, the stopping condition is that the square-error criterion is less than a pre-defined threshold. The square-error criterion is defined as below:

where C_{i} and m_{i} is cluster i and its mean, respectively; dissim(U, m_{i}) is the dissimilarity between user model U and the mean of cluster C_{i}.

The mean m_{i} of a cluster C_{i} is the center of such cluster, which is the vector whose t^{th} component is the average of t^{th} components of all user model vectors in cluster C_{i}.

where n_{i} is the number of user model in cluster C_{i}.

If user is modeled in a vector, the dissimilarity measure in k-mean algorithm is Euclidean distance. However,

User model clusters (means are marked by sign “+”)

there is a question: how to compute such measure in case that user model is an overlay model which is in form of domain graph? In this situation, the domain is decomposed into a set of elements and the overlay model is simply a set of masteries over those elements. So overlay model is the subset of domain model (see

It is essential that overlay model is the graph of domain; so overlay model is also called as graph model. Each node (or vertex) in graph is a knowledge item represented by a number indicating how user masters such knowledge item. Each edge (or arc) reveals the relationship between two nodes. It is clear to say that the dissimilarity measure needs changing so as to compare two overlay models. Note that the terms “user model”, “overlay model”, “graph model” are the same in this context. Suppose two user overlay models U_{1} and U_{2} are denoted as below:

where V_{i} and E_{i} are set of nodes and set of arcs, respectively. V_{i} is also considered as a vector whose elements are numbers representing user’s masteries of knowledge items.

Suppose graph model is in form of tree in which each directed arc represents the prerequisite relationship of two nodes. If there is an arc from node A to node B, user must master over A before learning B.

Let depth(v_{ij}) is the depth level of node j of graph model G_{i}. Note that the depth level of root node is 1.

Given the assumption “the structure of graphs of all users is kept intact”, the dissimilarity (or distance) of two graph models G_{1} and G_{2} is defined as below:

The meaning of this formulation is: the high level concept (node) is the aggregation of low level (basic) concepts.

The depth levels of j^{th} nodes of all graph models are the same because the structure of graphs is kept intact.

For example, there are three graph models whose structures are the same to the structure shown in

The dissimilarity (or distance) between G_{3} and G_{1}, G_{2}, respectively are computed as below:

So G_{2} is more similar to G_{3} than G_{1} is.

In case that each arc is assigned a weight representing the strength of prerequisite relationship between parent

Overlay model

Graph model in form of tree and prerequisite rela- tionships

. Values of graph nodes

A | B | C | D | E | F | G | |
---|---|---|---|---|---|---|---|

G_{1} | 2 | 1 | 1 | 0 | 3 | 2 | 1 |

G_{2} | 1 | 1 | 0 | 1 | 4 | 5 | 4 |

G_{3} | 2 | 1 | 1 | 1 | 4 | 5 | 4 |

node and child node, the following figure shows this situation:

The dissimilarity (or distance) of two graph models G_{1} and G_{2} is re-defined as below:

Let weight(v_{ij}) be the weight of arc from node j (of graph model i) to its parent. We consider that weight(v_{ij}) is the weight at node v_{ij}. The sum of weights at nodes having the same parent is equal to 1.

The weight at root node is equal to 1.

The weight at j^{th} nodes of all graph models are the same because the structure of graphs is kept intact.

For example, there are three graph models whose structures are the same to the structure shown in _{3} and G_{1}, G_{2}, respectively are computed as below:

Graph model and weighted arcs

So G_{1} is more similar to G_{3} than G_{2} is.

Bayesian network is the directed acrylic graph (DAG) [

Suppose the structure of Bayesian network is in form of tree,

Let G_{1} and G_{2} be two Bayesian networks of user_{1} and user_{2}, respectively.

where V_{i} and E_{i} are a set of nodes and a set of arcs, respectively. The dissimilarity (or distance) of two Bayesian networks G_{1} and G_{2} is defined as below:

where Pr(v_{ij}) is the marginal probability of node j in network i. The inference technique to compute marginal probability is discussed in [

Although dissimilarity measure considered as the physical distance between them is used to cluster user models, we can use another measure so-called similarity measure so as to cluster user models. There are two typical similarity measures:

1) Cosine similarity measure

2) Correlation coefficient

Bayesian network and its CPT (s)

Suppose user model U_{i} is represented as vector U_{i} = {u_{i}_{1}, u_{i}_{2},…, u_{ij},…, u_{in}}, the cosine similarity measure of two user models is the cosine of the angle between two vectors.

The range of cosine similarity measure is from 0 to 1. If it is equal to 0, two users are totally different. If it is equal to 1, two users are identical. For example,

The cosine similarity measures of user_{4} and user_{1}, user_{2}, user_{3} are computed as below:

Obviously, user_{1} and user_{2} are more similar to user_{4} than user_{3} is.

On other hand, correlation coefficient which is the concept in statistics is also used to specify the similarity of two vectors. Let _{i}.

The correlation coefficient is defined as below:

The range of correlation coefficient is from –1 to 1. If it is equal to –1, two users are totally different. If it is equal to 1, two users are identical. For example, the correlation coefficients of user_{4} and user_{1}, user_{2}, user_{3} are computed as below:

. Four user models

feature_{1} | feature_{2} | feature_{3} | |
---|---|---|---|

user_{1} | 1 | 2 | 1 |

user_{2} | 2 | 1 | 2 |

user_{3} | 4 | 1 | 5 |

user_{4} | 1 | 2 | 0 |

Obviously, user_{1} and user_{2} are more similar to user_{4} than to user_{3}.

If cosine measure and correlation coefficient are used as the similarity between two user vectors, the serious problem will occurs. That is impossible to specify the mean of each cluster because cosine measure and correla- tion coefficient have different semantic from Euclidean distance. Instead of using k-mean algorithm, we apply k-medoid algorithm [

1) It randomly selects k user models as k medoids. Each medoid is considered as the “representative” of one cluster. There are k clusters.

2) For each remaining user model, the similarities between it and k medoids are computed. Such user model will belongs to cluster C_{i} if the similarity measure of this user model and the medoid of C_{i} is the highest.

3) After that, k medoids are selected again so that the average similarity of each medoid to other user models in the same cluster is maximal. If k medoids aren’t changed or the absolute-error criterion is less than a pre-de- fined threshold, the algorithm is terminated; otherwise returning Step 2.

The absolute-error criterion which is the typical stopping condition is defined as below:

where C_{i} and medoid_{i} is cluster i and its medoid, respectively; sim(U, medoid_{i}) is the similarity of user model U and the medoid of cluster C_{i}.

We discussed two clustering algorithms: k-means and k-medoids. It is concluded that the dissimilarity measures considered as distance between two user models are appropriate to k-means algorithm and the similarity mea- sure, such as cosine similarity measure and correlation coefficient are fit to k-medoids algorithm. The reason is that the essence of specifying the mean of cluster relates to how to compute the distances among user models. Conversely, the cosine similarity measure and correlation coefficient are more effective than distance measure because they pay attention to the direction of user vector. For example, the range of correlation coefficient is from –1, “two users are totally different” to 1, “two users are identical”.

However, cosine similarity measure and correlation coefficient are only used for vector model; they can’t be applied into overlay model, Bayesian network model. It is clear to say that the formulations we proposed to

k-medoid algorithm (user model vectors considered as medoids are marked by sign “+”)

Flow chart of k-medoid algorithm

compute the dissimilarity of two overlay models (or Bayesian network) are the variants of distances between user models.