A Heuristic Reputation Based System to Detect Spam Activities in a Social Networking Platform, HRSSSNP

doi:10.4236/sn.2013.21005

Paper Menu >>

Journal Menu >>

Social Networking, 2013, 2, 42-45

http://dx.doi.org/10.4236/sn.2013.21005 Published Online January 2013 (http://www.scirp.org/journal/sn)

A Heuristic Reputation Based System to Detect Spam

Activities in a Social Networking Platform, HRSSSNP

Manoj Rameshchandra Thakur1*, Sugata Sanyal2

1Computer Science Department, Veermata Jijabai Technological Institute, Mumbai, India

2Corporate Technology Office, Tata Consultancy Services, Mumbai, India

Email: *manoj.thakur66@gmail.com, sugata.sanyal@tcs.com

Received November 12, 2012; revised December 27, 2012; accepted January 14, 2013

ABSTRACT

The introduction of the social networking platform has drastically affected the way individuals interact. Even though

most of the effects have been positive, there exist some serious threats associated with the interactions on a social net-

working website. A considerable proportio n of the crimes that occur are initiated through a social networking p latform

[1]. Almost 33% of the crimes on the internet are initiated through a social networking website [1]. Moreover activities

like spam messages create unnecessary traffic and might affect the user base of a social networking platform. As a re-

sult preventin g interactions with malicious intent and spam activities becomes cruc ial. This work attempts to detect the

same in a social networking platform by considering a social network as a weighted graph wherein each node, which

represents an individual in the social network , stores activities of other nodes with respect to itself in an optimized for-

mat which is referred to as localized data-set. The weights associated with the edges in the graph represent the trust re-

lationship between profiles. The weights of the edges along with the localized data-set are used to infer whether nodes

in the social network are compromised and are performing spam or malicious activities.

Keywords: Spam; Social Graph; Collaborative Filtering; Weighted Graph; Localized Data-Set; Trust Level

1. Introduction

A considerable amount of work has been done in the area

of spam detection and trust based recommendation sys-

tems for social networking platforms. A brief overview

of these is as follows: [2] suggests a dynamic personal-

ized recommendation system that is based on the trust

between agents. It us es th e con cep t of feedb ack centrality

and overcomes some of the limitations of earlier recom-

mendation systems that use other trust metrics. In the

model suggested in [3] an agent tries to filter interactions

based on the information that it gains from its own social

network. The model suggested in [3] identifies the im-

pact of factors like preference heterogeneity of agents,

network density among agents, and knowledge sparse-

ness which are crucial factors for the performance of the

model. The technique suggested in paper is however dif-

ferent from the earlier two works, in that it makes use of

a weighted social graph [4] to view the relationship be-

tween profiles in a social networking platform . The

technique suggested in [5,6] suggests a reputation based

intrusion detection system to detect malicious and com-

promised nodes in a mobile ad-hoc network. Even

though this work is not directly related to social n etwork-

ing platforms, the approach sugg ested in [5,6] is relevant

to the problem of filtering malicio us and spam conversa-

tions among agents in a social networking platform.

A number of works have been suggested in the recent

past that address the issue of email spam activities and

suggest techniques to combat them. Email spam, even

thought not d irectly related to spam activities, is relev ant

to our work primarily because spam emails are analogous

to spam and malicious activities in a social network. [7]

presents a new email ranking and classification scheme

that makes use the social email interactions to infer the

spam/non-spam status of the sender of any given email.

The rank assigned to an email address based on its inter-

actions represents the reputation of the email address.

The work suggests two level of ranking: global rank

which is recipient email address agnostic and personal-

ized rank that varies based on the email address which is

receiving the email from a particular sender. [8] presents

a email scoring mechanism which is again based on so-

cial interactions and assigns reputation ratings to email

addresses. [9] attempts to discriminate spam activities

from non-spam activities by applying various machine

learning techniques. The classification is based on six

distinct features that the work identifies. [10] attempts to

assign the legitimacy to the sender of an email based on

the features extracted from various email interaction.

*Corresponding author.

M. R. THAKUR, S. SANYAL 43

This work attempts to adopt a learning approach for de-

tecting spam activities.

Intrusion detection, which involves trying to identify

malicious and compromised nodes in a given network, is

similar to the process of identifying compromised agents

is a given social graph representing the way in which

individuals are connected. [11] discusses some of the se-

curity issues associated with distributed computing infra-

structures most of which apply to a social network as

well. Approaches like the ones suggested in [12-16] are

instrumental in not only addressin g the problem of intru-

sion by malicious nodes in a network but are also indi-

rectly helpful in devising similar approaches for spam

and malicious agent detection in a social network.

2. The Social Graph

A social graph may be defined as a graph that represents

the way individual are related to each other on the inter-

net [17]. Even though it represents the relationship be-

tween individuals it doesn’t manifest in any way th e trust

level among individuals. Two individuals might be re-

lated but might no t have a high trust level. The ab ility to

represent the trust level in a social graph can impart a

powerful tool to detect and prevent unwanted interaction.

For example, if A is not related to B but wants to interact

with B. B will try to obtain relevant information from an

individual C with whom B has a high trust level and

based on the inputs B will decide whether to allow A to

interact or not. The suggested approach attempts to de-

rive this trust level among profiles based on previous

interactions and the relationship type and represent the

trust level as weights corresponding to the edges that

represent the relationship between profiles.

3. Collaborative Filtering and

Unwanted/Malicious Activity Detection

In a given social networking platform, the following

holds, “If Profile A is victimized by a malicious interac-

tion by C then the chances of profile B being victimized

by profile C is high”. It is this relationship that has in-

spired the use of collaborative filtering [18] for the sug-

gested approach. The suggested approach adds an addi-

tional constraint that in order for profile B to detect

whether profile C is trying to initiate a mali cious interac-

tion it will only try to take recommend ation from prof iles

which it trusts i.e. profiles with high trus t level.

4. Weighted Social Graph

The suggested approach views the social networking

plat-form as a weighted social graph [4]. Each node re-

presents a profile (an individual), an edge represents a

relationship between profiles and the weight correspond-

ing to the edge represents the trust level among profiles.

Each profile has a localized dataset associated with it th at

holds a table with the following format:

Profile Id (say X)  < incoming activity with X > : <

outgoing activity with X >

We refer to the localized dataset as LD, such that LD

(X) represents the localized dataset entry of profile Y for

profile X, such that the incoming activity with X is rep-

resented as I(X) and the outgoing activity with X as O(X).

Thus,







, LD XIXO X



(1)

The overall view of the weighted social graph can be

depicted as shown in Figure 1.

5. Localized Data-Set and EDGE Weight

Calculation

Corresponding to each profile, the localized data-set re-

fers to a table with the following entries:

 The first column represents profile Ids to which the

given profile (Y) is connected i.e. has a relationship

(X).

 The second column has entries in the following for-

mat (LD(X)):







, LD XIXO X





(2)

Incoming activity, I(X): it represents the activities

which were initiated by X in which the considered profile

was the destination. These interactions include activities

like message sent from X to the considered profile, friend

request from X to the considered profile, comments on a

photo of the considered profile etc. It must be noted that

the type of interaction as mentioned earlier may vary

based on the social networking website considered.

Outgoing activity, O(X): it represents the activities

which were initiated by the considered profile wherein

profile X was the destination. These interactions include

activities like message sent to X, friend request sent to X,

Figure 1. Weighted social graph.

M. R. THAKUR, S. SANYAL

comments on a photo of profile X etc. It must be noted

that the type of interaction as mentioned earlier may vary

based on the social networking website considered.

In real life interactions, the trust among individuals in-

creases over time as the interactions among the individu-

als increase. These interactions are however bi-direc-

tional i.e. they include interactions which are initiated by

both the involved individuals. The suggested approach

employs this concept to calculate the trust levels among

profiles. The trust level of profile X for a connected pro-

file Y is represented by T (X, Y) and T (X, Y) vice versa.







,TYXIX OX 



(3)







,TXYIY OY





(4)

In order to calculate O(Y) and I(Y), the entries in the

localized data-set of X are considered corresponding to

profile Y. A similar procedure is used to calculate O(X)

and I(X). A value for [I(X)/O(X)] that is close to 1 repre-

sents a high trust level of profile X with respect to profile

Y. The value of I(X) increasing at a rate higher than that

of O(X) represents a spam activity initiated by profile X

and vice versa. It must be noted that the suggested tech-

nique will allow first few spam messages, if any, after

which as the value T(X, Y) increases the possibility of a

spam increases and after the upper threshold the interac-

tion will be blocked and marked as spam interactions.

Consider the scenario where say a fake profile A is cre-

ated which sends ou t a friend request to a legitimate pro-

file B. Now in such a scenario for B, O(A) and I(A) are

both 1, since A initiated an outgoing interaction and B

replied to it. However after the first few spam messages

the value [I(A)/O(A)] will increase thus preventing A

form initiating any further spam messages.

In order to illustrate the variation in T(Y, X) and hence

the reputation of profile X with respect to profile Y we

present a graph showing the cumulative number of in-

comeing and outgoing messages by profile Y with respect

to profile X. Figure 2 represents a non-spam normal in-

teraction and Figure 3 represents a spam interaction.

It must be noted that the time unit used for storing the

number of interactions starts with the granularity level of

seconds and as and when the time proceeds the granular-

ity level of the data to be stored increases. For example

one configuration that can be used is to store the counts

on a per second basis for one minute after which store the

counts on a per minute basis for one hour and after which

store the counts on a per hour basis. The flat sections in

the line graphs above represent no interaction. In Figure

2 throughout the time span represented in the graph the

count of I(X) and O(X) is almost the same. As a result the

value of T(X, Y) is close to 1 throughout the time span,

which represents a valid non-spam interaction. In Figure

Figure 2. Graph representing non-spam interaction.

Figure 3. Graph representing spam interaction.

3 however it can be seen that after point P the difference

between I(X) and O(X) increases at a higher rate. Thus

after point P the chances of the interaction between X and

Y being a spam increases. It must be noted that the

graphs presented in Figures 2 and 3 are not based on any

actual data. The graphs have been used to merely illu-

strate the way in which the suggested technique operates.

6. Collaborative Filtering of Interactions

For a given profile if an incoming interaction is initiated

from a profile then the profile first checks if the profile is

connected. If the source profile is connected then the

interaction is accepted only if the trust level between the

two profiles is higher than a predefined threshold and the

localized data-set of each of the profiles is updated. If

however the source profile is not connected then the des-

tination profile tries to find the trust level of the source

profile with a third profile with which the destination

profile has a high trust level. For example if A tries to

interact with B then B will accept the interaction only if

the trust level between A and B is greater than a particu-

lar threshold. However if A is not connected to B then B

tries to derive or infer the trust level from a third profile

C such that the trust level between B and C is higher than

the threshold and C is connected to A.

M. R. THAKUR, S. SANYAL

7. Conclusion

The suggested technique thus addresses the issue of ma-

licious and spam interactions among profiles in a social

networking platform in an effective way by correlating

the scenario with the interactions in the society. The use

of the weighted social graph imparts the suggested tech-

nique the ability to not only view and understand the way

individuals are connected in a social networking platform

but also reflects the trust level among individuals which

helps to filter out malicious and unwanted spam interac-

tions. It must be noted that the suggested technique will

be unable to prevent spam and malicious interaction if

already existing legitimate profiles with high trust level

are compromised. The solution to this problem is outside

the scope of this work however a potential solution to

this problem is the N/R one time password system sug-

gested in [19]. The problems of passwords of legitimate

profiles being disclosed by means of attacks like pass-

word guessing attacks can be addresses by the approach

suggested in [19].

REFERENCES

[1] Social Networking Statistics, URL (last checked 14 Dec

2012)

http://www.internetsafety101.org/Socialnetworkingstats.h

tm.

[2] E. F. Walter, S. Battiston and F. Schweitzer, “Persona-

lized and Dynamic Trust in Social Networks,” Proceed-

ings of The Third ACM Conference on Recommender

Systems, Association for Computing Machinery, New

York, pp. 197-204.

[3] E. F. Walter, S. Battiston and F. Schweitzer, “A Model of

a Trust-Based Recommendation System on a Social

Network,” Autonomous Agents and Multi-Agent Systems,

Vol. 16, No. 1, 2008, pp. 57-74.

doi:10.1007/s10458-007-9021-x

[4] Uniform Resource Locator, “Weighted Graphs.” (last

checked 02 Dec 2012)

http://courses.cs.vt.edu/~cs3114/Fall10/Notes/T22.Weigh

tedGraphs.pdf.

[5] A. K. Trivedi, R. Arora, R. Kapoor, Sudip Sanyal and Su-

gata Sanyal, “A Semi-Distributed Reputation-Based In-

trusion Detection System for Mobile Ad hoc Networks,”

Journal of Information Assurance and Security, Vol. 1,

No. 4, 2006, pp. 265-274.

[6] A. K. Trivedi, R. Kapoor, R. Arora, Sudip Sanyal and

Sugata Sanyal, “RISM—Reputation Based Intrusion De-

tection System for Mobile Ad hoc Networks,” Third In-

ternational Conference on Computers and Devices for

Communications, Institute of Radio Physics and Elec-

tronics, University of Calcutta, Kolkata, 18-20 December

2006, pp. 234-237.

[7] P. A. Chirita, J. Diederich and W. Nejdl, “MailRank:

Using Ranking for Spam Detection,” Proceedings of the

14th ACM international conference on Information and

Knowledge Management, Bremen, 31 October-5 Novem-

ber 2005.

[8] J. Golbeck and J. Hendler, “Reputation Network Analysis

for Email Filtering,” Proceedings of the 1st Conference

on Email and Anti-Spam, Mountain View, 2004.

[9] B. Markines, C. Cattuto and F. Menczer, “Social Spam

Detection,” Proceedings of the 5th International Work-

shop on Adversarial Information Retrieval on the Web,

Madrid, 21-21 April 2009. doi:10.1145/1531914.1531924

[10] H. Y. Lam and D. Y. Yeung, “A Learning Approach to

Spam Detection based on Social Networks,” Proceedings

of the Fourth Conference on Email and Anti-Spam,

Mountain View, 2007.

[11] R. Bhadauria and S. Sanyal, “Survey on Security Issues

in Cloud Computing and Associated Mitigation Tech-

niques,” International Journal of Computer Applications,

Vol. 47, No. 18, Foundation of Computer Science, New

York, 2012, pp. 47-66. doi:10.5120/7292-0578

[12] R. A. Vasudevan, A. Abraham, S. Sa nyal and D. P. Agra -

wal, “Jigsaw-Based Secure Data Transfer over Computer

Networks,” IEEE International Conference on Informa-

tion Technology: Coding and Computing, Las Vegas, Vol.

1, 2004, pp. 2-6.

[13] A. Abraham, R. Jain, S. Sanyal and S. Y. Han, “SCIDS:

A Soft Computing Intrusion Detection System,” In: A.

Sen, et al., Eds., 6th International Workshop on Distri-

buted Computing, Lecture Notes in Computer Science,

Vol. 3326, Springer Verlag, Berlin, pp. 252-257.

[14] S. Sanyal, D. Gada, R. Gogri, P. Rathod, Z. Dedhia and N.

Mody, “Security Scheme for Distributed DoS in Mobile

Ad Hoc Networks,” Technical Report, School of Tech-

nology & Computer Science, Tata Institute Of Funda-

mental Research 2004.

[15] S. Pal, S. Khatua, N. Chaki and S. Sanya l, “A New Trust-

ed and Collaborative Agent Based Approach for Ensuring

Cloud Security,” Annals of Faculty Engineering Hune-

doara International Journal of Engineering, Vol. 10, No.

1, 2012, pp. 71-78.

[16] P. Rathod, N. Mody, D. G., Rajat G., Z. Dedhia, S. San-

yal and A. Abraham, “Security Scheme for Malicious

Node Detection in Mobile Ad Hoc Networks,” In: A. Sen

et al., Eds., 6th International Workshop on Distributed

Computing, Lecture Notes in Computer Science, Vol.

3326, Springer Verlag, Berlin, 2004, pp. 541-542.

[17] Uniform Resource Locator, “Social Graph.” (last checked

on 11 Dec 2012)

http://en.wikipedia.org/wiki/Social_graph.

[18] Uniform Resource Locator, “Recommender System.”

(last checked 10 Dec 2012)

http://en.wikipedia.org/wiki/Recommender_system.

[19] V. Goyal, V. Kumar, M. Singh, A. Abraham and S.

Sanyal, “CompChall: Addressing Password Guess-

ing Attacks,” IEEE International Conference on In-

formation Technology: Coding and Computing, Vol.

1, 4-6 April 2005, pp 73 9- 7 44 .