J. Service Science & Management, 2009, 2: 400-403
doi:10.4236/jssm.2009.24048 Published Online December 2009 (www.SciRP.org/journal/jssm)
Copyright © 2009 SciRes JSSM
A Personalized Recommendation Algorithm Based
on Associative Sets
Guorui JIANG, Hai QING, Tiyun HUANG
School of Economics and Management, Beijing University of Technology, Beijing, China.
Email: jianggr@bjut.edu.cn
Received August 11, 2009; revised September 13, 2009; accepted October 24, 2009.
ABSTRACT
During the process of personalized recommendation, some items evaluated by users are performed by accident, in other
words, they have little correlation with users’ real preferences. These irrelevant items are equa l to noise data, and often
interfere with the effectiveness of collaborative filtering. A personalized recommendation algorithm based on Associa-
tive Sets is proposed in this paper to solve this problem. It uses frequent item sets to filter out noise data, and makes
recommendations according to users’ real preferences, so as to enhance the accuracy of recommending results. Test
results have proved the superiority of this algorithm.
Keywords: Frequent Itemsets, Associative Sets, Collaborative Filtering, Recommendation Technology
1. Introduction
How to help users quickly and effectively access to the
information they really need when facing abundant re-
sources becomes a challenging task and also a hot topic
of current academic study. Personalized recommenda-
tion system is one of the effective tools to solve this
problem. A helpful method is to develop intelligent
recommendation system to provide personalized service
[1], that is to recommend products to users according to
their preferences or demands, so as to help them finish
th e p ur c h asing pro cess.
Nearest neighbor collaborative filtering approach is a
recommendation technique that is the most widely used
right now [2]. Its basic idea is to generate recommenda-
tions for target users according to the rating data of near-
est neighbors that have given similar ratings. As items’
(movies, music, etc.) ratings g iven by the nearest nei g hbo rs
are quite similar to those given by target users, items’
ratings given by target users can be estimated by the w e ight -
ed average of the ratings given by the nearest neighbors.
The advantage of collaborative filtering approach is that
it can adapt to the rapid updating of users’ information. It
caculates the tightness among users according to the lat-
est data every time, so as to make recommendations. H ow -
ever, the consequent disadvantage is that it is quite slow
to get K nearest neighbors within large amounts of data.
Meanwhile, results would not be satisfactory when sparse
data is dealt with, especially for new products and new users.
At the same time, its scalability is not very good [3].
On the basis of traditional collaborative filtering algo-
rithm, our paper proposes a personalized recommendation
algorithm based on Associative Sets. This algorithm first
supposes user rating matrix as transaction sets, while ev ery
transaction is a user’s rating set. Then it generates freq ue nt
itemsets through frequent itemsets generation algorithm,
puts frequent itemsets into a series of Associative Sets
according to one user’s rating record, and performs col-
laborative filtering among Associative Sets so as to im-
prove the accuracy and scalability of the algorithm.
2. Traditional Collaborative Filtering
Algorithm and Its Analysis
2.1 Traditional Collaborative Filtering
Algorithm
Collaborative filtering algorithm is the most widely used
approach in personalized recommendations, which can
forecast target users’ interests and preferences according
to neighbor users’ interests and preferences. It first finds
neighbors that have the same preferences with target us-
ers under the help of statistical techniques, and then
makes recommendations to target users according to their
neighbors’ preferences. It includes three stages [4]:
1) Representation
Inputting dat a can usual l y be expressed as an m×n user
rating matrix, where m represents the number of users, n
represents the number of items, and Rij represents the
rating given by user i to item j. Such ratings can have
several scales just as Table 1.
A Personalized Recommendation Algorithm Based on Associative Sets401
Table 1. User/item rating matrix
Item
User
I1 I2 Ij In
U1 R11 R12 R1j R1n
U2 R21 R22 R2j R2n
Ui Ri1 Ri2 Rij Rin
Um Rm1 Rm2 Rmj Rmn
2) Neighbor Generation
For user u, generate a “nearest neighbor” set according
to the level of similarity between neighbors. The calcula-
tion of similarity values between neighbors can be per-
formed through vector space similarity calculation me thods
that are widely used currently, such as cosine method,
pearson similarity method and so on. There are two ways
to determine neighbors, one is to determine the similarity
threshold through cosine method first, and then select
users whose similarity values are greater than the simi-
larity threshold as neighbor users; the other is to deter-
mine the number of neighbor users N first, and then se-
lect the first N users whose similarity values are greater
as neighbor users.
3) Recommendation
As “nearest neighbor” set is generated, we can forecast
one certain user’s rating for each item, and then make
recommendations to that user according to the level of
forecasting ratings.
2.2 Problem Analysis
Traditional collaborati ve filtering al gorithm considers users’
entire historical information as its preference in formation
and uses such information to find its nearest neighbors.
However, users’ preferences are often formed exploringly
and progressively in reality. It is a historical progress,
and during this progress, users often try many times and
become stable gradually, so as to form their real interests.
Even though users’ interests have already been formed,
they would sometimes try other items in daily search
process for various reasons. Such items cannot be seen as
their interests and the supporting evidences for recom-
mendations. Therefore, if we want to gain real preference
information of users, we must filter out the occasional
search information to reduce interference. Find nearest
neighbors according to users’ real preference information,
and then make recommendations, while the results of
recommendations can become better.
3. A Personalized Recommendation
Algorithm Based on Associative Sets
As we have analyzed above, we can gain associative
items through frequent itemsets. These associative items
constitute the foundations of different interests. As for
current users, we use their entire information to filter
associative items, and then merge the associative items
after filtering to form their interest sets.
3.1 Algorithm Descriptions
1) Set all items as 123
{,, ,..., }
n
I
III I
, n as the number
of items. See every user’s rating record as one item of
transaction, , which represents the rating set of
user i, wherein
i
TI
{1,2,3,..., }im
, m represents the num-
ber of users. So, user rating matrix can be seen as trans-
action set 123
, ,...,}
m
TT T{ ,TT
.
2) Use Apriori algorithm to generate the frequent item-
sets F of transaction set T, whose support level is support.
In the first iteration of the algorithm, each item of I
is a member of the set of candidate 1-itemsets. The algo-
rithm simply scans all of the transactions T in order to
count the number of occurrences of each item.
Select the candidate 1-itemsets, which satisfies
minimum support support, to consist the set of frequent
1-itemsets 1
L.
Use 11
LL
to generate a candidate set of 2-itemsets,
and prune using apriori property---A ll nonempty su b s e ts o f
a frequent itemset must be frequent. Then, scan all of the
transactions T in order to count the number of occur-
rences of each item in candidate set of 2-itemsets.
Select the candidate 2-itemsets, which satisfies min-
imum support support, to consist the set of frequent
2-itemsets 2
L.
Constantly use 11kk
LL
to generate a candidate
set of k-itemsets, and prune it. Then, scan all of the
transactions T in order to count the occurrence of each
Copyright © 2009 SciRes JSSM
A Personalized Recommendation Algorithm Based on Associative Sets
402
item in candidate set of k-itemsets. Select the candidate
k-itemsets, which satisfies minimum support support, to
consist the set of frequent k-itemsets k
L.
If the candidate set of k-itemsets is null, all frequent
itemsets are gained.
3) Gain rating items *
I
of current user a, and merger
the frequent itemsets, which contains some items of *
I
and also the number it contains is more than parameter
num in F, as associative sets C.
4) Use Pearson correlation coefficient algorithm to cal-
culate the similarity between user a and any other user b
in associative sets C.
22
()(
CC
j
bj
R
R


)
(,)[( )][( )]
aj a bb
jC
C
CC
aj ab
jC jC
RR R
wab RR R


(1)
where j is the item in associative sets C, is the rat-
ing given by user a to item j, is the rating given by
user b to item j,
aj
R
bj
R
C
a
R and C
b
R are average ratings of
user a and user b separately.
5) For a, arrange all the users according to the value of
, and select the first M users that have greater
values as neighbor users of user a.
(,)
C
wab
a
6) Forecast the rating of u ser a to item j. The forecast-
ing formul a is:
Neighbor
,
)
bj
R
b
,
(,)()
(,
a
a
Cb
b Neighbor
aj aC
b Neighbor
wab R
PR wa

(2)
where is the forecasting rating of user a to item j,
is the rating of user b to item j,
,aj
P
,bj
Ra
R and b
R are
average ratings of user a and user b to all items.
7) Arrange items according to the value of , and
select the first N items that have greater as recom-
mendatory items.
,aj
P
,aj
P
3.2 Algorithm Explanations
1) In addition to the original rating records, there are four
other parameters in this algorithm: support for the calcula-
tion of frequent itemsets support, threshold for the selec-
tion of associative sets from frequent itemsets num, num-
ber of nearest neighbors M and number of items that are
recommended to users N. Wherein, support and num are
used to determine Associative Sets, but what is the right
combination needs to be tested. Usually, different data sets
have different proper combination of support and num.
Therefore, it will take more time to learn this algorithm.
2) Step 1 and step 2 in algorithm description are mai-
nly used to generate frequent itemsets, which will take
much more time. However, as it is performed offline,
instant recommendations cannot be influenced.
3) As frequent itemsets have to be merged (Step 3)
before collaborative filtering, it will take more time online
than traditional algorithm will take, but its accuracy can
be improved greatly. As the duration of merging frequent
itemsets relies on the number of frequent items, to reduce
frequent itemsets through offline activities can shortern
online duration. In addition, because associative filtering
items for every user are somewhat less than all the items,
the duration of collaborative filtering process itself will
be reduced. Through optimization, online duration of
algorithm can be reduced accordingly.
4) Frequent itemsets include the complete set of fre-
quent itemsets, the closed frequent itemsets, maximal fre-
quent itemsets and so on [5]. The frequent itemsets used
in this paper are maximal frequent itemsets, which can
reduce the number of frequent itemsets greatly. If other
frequent itemsets are used, we can calculate the impor-
tance of different items when calculating nearest neig-
hbors with the help of support when merging frequent
itemsets, which can improve the accuracy further more.
4. Test Process
4.1 Data Set and Evaluation Standard
Data set MovieLens is used to test this algorithm, which
is provided by the GroupLens research lab at the Univer-
sity of Minnesota. The data was collected through the
MovieLens web site (movielens.umn.edu) during a
seven-month period. MovieLens includes 100000 records
of ratings given by 943 users to 1682 movies. A rating is
a number from 1 to 5, optionally supplemented by the
number of seconds which the user spent reading the
movie. Users are encouraged to assign ratings based on
how much they liked the movie, with 5 highest and 1
lowest. Each user has given ratings to 20 moves at least.
You can get the date set at www.grouplens.org.
Average Abs olute Error (MAE) is used to evaluate the
forecasting accuracy of this algorithm. MAE is the devia-
tion average of the actual value and the predictive value of
the ratings given by all users to the items. The lower the
value of MAE is, the better the recommendations are. Sup-
posing user rating set is , and the actual user
rating set is , MAE is defined as follows [6]:
12
{ ,,...,}
N
pp p
}
N
q
12
{, ,...,qq
1
N
ii
i
pq
MAE N
(3)
4.2 Test Results and Remarks
We will compare Associative Sets Based Collaborative
Filtering (ASBCF) and traditional User Based Collabora-
tive Filtering (UBCF) during the test process. In order to
verify the results, we will test in two dimensions.
One is to test in different sparse degrees. Here we se-
lect the first 200 users and first 500 items in MovieLens
rating records, and then deduct some records every time
randomly. In the end, we gain rating records under
Copyright © 2009 SciRes JSSM
A Personalized Recommendation Algorithm Based on Associative Sets
Copyright © 2009 SciRes JSSM
403
200*500, 13270 pieces, 7976 pieces, 4525 pieces and 21 6 2
pieces separately, and their sparse degrees are 0.13, 0.08,
0.043, 0.021662 separately, see Figure 1.
fectively than UBCF, and when the number of neighbors
increases, users that are a little further from current user
are also selected, which can increase error. That is to say,
ASBCF is more effective than UBCF. From the results, we can see that in different sparse
degrees, MAE of ASBCF is lower than that of UBCF,
5.2206% in average. That is to say, ASBCF performs
better in every sparse degree than UBCF. However, as
the data is sparse extremely, ASBCF’s accuracy will be
reduced accordingly.
5. Conclusions
This paper proposes a personalized recommendation (col-
laborative filtering) algorithm based on Associative Sets. It
generates a series of frequent itemsets through frequent
itemsets generation algorithm, and then filters out some
noise items that have little relevence with users by merg-
ing, so as to make collaborative filtering algorithm more
effective. It is proved that this algorithm is better than tra-
ditional algorithm in recommendation accuracy. Although
it takes more time to generate frequent items, it will not
influence instant recommendations, as the generation can
be performed offline. Support of frequent itemsets owns
one kind of new information, which represents different
items’ importance. If such information is used in collabo-
rative filtering, forecasting accuracy will be improved, and
this is the breakthrough point for further research.
The other is to compare values of MAE with different
numbers of nearest neighbors. Here we select 10, 20, 30,
40 and 50 nearest neighbors, and the test results can be
seen in Figure 2.
From Figure 2, we can see that ASBCF also performs
better than UBCF with all kinds of nearest neighbor
numbers. However, along with the increasing of neighbor
number, their gap becomes smaller and smaller. Maybe it
is because ASBCF can find nearest neighbors more ef-
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
0.76
0.13 0.08 0.043 0.02162
Sparse of Data
MAE
ASBCF
UBCF
REFERENCES
[1] J. B. Schafer, J. A. Konstan, and J. Riedl, “E-Commerce
recommendation applications,” Journal of Data Mining
and Knowledge Discovery, pp. 115–153, 2001.
[2] J. Breese, D. Hecherman, and C. Kadie, “Empirical
analysis of predictive algorithms for collaborative
filtering,” In Proceedings of the 14th Conference on
Uncertainty in Artificial Intelligence (UAI-98), pp. 43–52,
1998.
Figure 1. MAE of ASBCF and UBCF under different sparse
degrees [3] L. Zhao, N. J. Hu, and S. Z. Zhang, “Algorithm design for
personalization recommendation systems,” Journal of
Com- puter Research and Development, pp. 986–991,
August 2002.
0.71
0.62
0.63
0.64
0.65
0.66
0.67
0.68
0.69
0.7
10 20 3040 50
umber of Neighbors
MAE
ASB C F
UBCF
[4] Y. Li, L. Liu, and X. F. Li, “Research on personalized
recommendation algorithm for user’s multiple interests,”
Journal of Computer Integrated Manufacturing Systems,
pp. 1610–1615, December 2004.
[5] J. W. Han and M. Kamber, “Data mining concepts and
techniques (Second Edition),” Ming Fang, Xiaofeng
Meng translated, China Machine Press, pp. 149–161,
March 2007.
[6] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item
based collaborative filtering recommendation algo-
rithms,” In Proceedings of the Tenth International World
Wide Web Conference, pp. 285–295, 2001.
Figure 2. MAE of ASBCF and UBCF with different num-
bers of nearest neighbors