Communications and Network, 2013, 5, 529-543
http://dx.doi.org/10.4236/cn.2013.53B2097 Published Online September 2013 (http://www.scirp.org/journal/cn)
Copyright © 2013 SciRes. CN
A P2P Platform for Collaborative Aggregated Multimedia
Sharing
Ines Fakhfakh, Hongguang Zhang, Marc Girod-Genet
UMR CNRS 5157, Telecom SudParis, Institut Mines-Telecom , E vry, France
Email: i ne s.fakhfakh@ it-sudparis.eu
Received June 2013
ABSTRACT
Peer-to-peer technologies have emerged as a powerful and scalable communication model for large scale content shar-
ing. However, they are not yet provided with optimized heterogeneous aggregated content management functionality
since they lack rich semantic specifications. To overcome these shortcomings, we elaborated a reference model of P2P
architecture for a dynamic aggregation, sharing and retrieval of heterogeneous multimedia contents (simple or aggre-
gated). This architecture was mainly developed under the CAM4Home European research project and is fully based on
the CAM4Home semantic metadata model. This semantic model relies on RDF (Resource Description Framework) and
is rich (but simple enough), extensible and dedicated for the description of any kind of multimedia content. In this paper,
we detail and evaluate an original semantic-based community network architecture for heterogeneous multimedia con-
tent sharing and retrieval. Within the presented architecture, multimedia contents are managed according to their asso -
ciated CAM4Home semantic metadata through a structured P2P topology. This topology relies on a semantically en-
hanced DHT (Distributed Hash Table) and is also provided with an additional indexing system for offering semantic
storage and search facilities and overcoming the problem of exact match keywords in DHTs.
Keywords: P2P; DHT; RDF; Semantics; Metadata; Semantic Indexing and Search; Multimedia Content Sharing;
Community Networking
1. Introduction
Peer to peer (P2P) systems provide scalable distributed
mechanisms for data sharing and retrieval. There are two
kinds of P2P topologies: unstructured and structured over-
lays [1,2].
Unstructured overlays, e.g. Gnutella [4], organize peers
in a random graph and use flooding to propagate peers
queries. As both data and index items remain locally at
peer level, each query is only evaluated against the local
index. Unstructured networks are therefore not suitable
for reducing lookup costs (the number of involved peers)
by exploiting data correlations b etween data items shared
in a P2P system. Moreover, flooding usually causes com-
munication redundancy and network congestion. In addi-
tion, there is no central coordinator which has knowledge
of all the shared data. Data is therefore not reorganized in
the network and no global structure is maintained [2-7].
So, its hard to provide semantic queries in an unstruc-
tured net work.
Structured overlays, like kademlia [8], carry out a glo-
bally consistent protocol to determine where the distri-
buted resources should be stored by the provider and
cope with scalability and locality issues [1,2,8,9]. Those
protocols rely on hashed identifiers to manipulate shared
documents in an effec ti ve wa y. I f s t r uc t ure d overlays were
developed to improve performances of data delivery,
they are unfortunately more expensive to maintain (e.g.
in terms of time and query number) and are not support-
ing efficient complex queries [2,3,10-12].
Another kind of P2P protocol is hybrid overlays where
some functionality are still centralized , thus attempting to
combine the flexibility of unstructured protocols with the
lookup performance guarantees of structured ones [2,13-
15].
There is unfortunately not yet a fully semantic-oriented
P2P community network for wide area multimedia con-
tent sharing, even if some research woks have already
been conducted in the P2P overlay world [16-20]. The
only existing generic semantic metadata model for de-
scribing multimedia contents comes from the CAM4Home
project [21,22]. However, this model was never combined
with existing P2P systems for improving large scale mul-
timedia content sharing and retrieval. Consequently peers,
in existing systems, do not have a global knowledge
about shared multimedia contents similarities and rela-
tionships.
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
530
Furthermore, peer’s local knowledge about shared con-
tents can actually be expressed in various formats such as,
for example: Web pages (unstructured), text documents
(unstructured), XML (semi-structured), RDF and OWL
(structured), etc. This obviously depends upon the pro-
viders that generally handle the content in a proprietary
way, which introduces heterogeneity. Therefore, in the
context of efficient content aggregation and sharing col-
lective knowledge composition, data knowledge must be
expressed in a machine manageable/inferable format (e.g.
RDF or OWL) and also in a generic and extensible way
for heterogeneity management purposes. Otherwise, all
the data that are not expressed in this generic format will
have to be processed and converted in that generic format
(metadata extraction). Let us note that metadata about
data are mainly expressed in RDF format which is a
W3C standard [22].
In our work, we focus on the interactive sharing and
delivery of multimedia content in P2P community net-
works (a group). We investigate structured P2P networks
using uniform hashing, often called Distributed Hash
Tables (DHT). To address the flexibility and efficiency
shortcomings, we combine their P2P mechanisms with
semantic systems. We aim at proposing an enhanced se-
mantic-based P2P architecture relying on generic struc-
tured descriptions for content heterogeneity management.
The retained descriptions are logically the RDF-based
CAM4Home semantic Metadata since they have been
specified for describing any kind of multimedia contents
(simple and aggregated) in a generic way [22].
To interconnect users in our P2P community, we rely
on the commonly used structured P2P system called Pa-
stry [23]. A Pastry system is a self-organizing overlay
network of nodes which performs application-level routing
and object location in a potentially very large overlay
network of nodes connected via the Internet. It is a fully
decentralized, fault-resilient, reliable and scalable build-
ing block currently used for in particular the design of
large scale P2P file sharing and group communication
systems [24,25]. On top of Pastry, we carry out a scalable
application-level multicast overlay infrastructure called
Scribe [25]. Scri b e i s a top ic -based publish/subscribe (event
based) system where any peer can create a topic and the
other peers can then subscribe to the topic (i.e. register
their interest in the topic). Any Scribe peer can therefore
publish events and Scribe efficiently disseminates these
events (application level multi-point delivery) to all the
topic’s subscribers [25].
On top of the two aforementi oned systems (Pastry and
Scribe), we finally introduce a new layer that carries out
the following mechanisms for enhancing the DHT se-
mantic function alities: an or iginal metad ata index ing algo-
rithm and an original metadata semantic searching me-
chanism (handling incomplete information searching). This
layer not only extends the DHT with semantic search but
also allows the overcoming of the problem of exact match
keywords.
The paper is organized as follow. Section 2 will de-
scribe the proposed community network architecture. Then,
Section 0 will detail the original semantic indexing and
search algorithms added to this community. Section 4
will summarize the implementation of the proposed P2P
community network. This community networks was in-
tegrated within the CAM4Home ITEA2 project pan-Eu-
ropean test-bed. Finally, Section 0 w ill pr esent some evalu-
ation res ul ts of the pr oposed P2P communi ty.
2. Overview of the P2P Community Network
Architecture
2.1. General Architecture
The proposed P2P community network has been design
as a subsystem of the CAM4Home network architecture
([21]) that relies on peer-to-peer networking technology.
The subsystem consists of connected digital home envi-
ronments, being comprised of user terminals (such as mo-
bile phones, communicating terminals, tablets and PCs)
and dedicated community network servers. The purpose
of the proposed community network is to provide a ser-
vice enabler for aggregated multimedia content manage-
ment to distribute and retrieve content in a peer-to-peer
network. This community network provides the commu-
nity and user management with additional features such
as eventing, social grouping and communication between
the users. These functions or basic services are referred
to as the P2P community network functions.
The proposed community network has two main
components: the Community Network P2P Modules and
Functions and the Community Network Server. The
Community Network P2P modules offer a set of func-
tionalities to create peer groups for shar ing and retrieving
data in P2P overlays. The Community Network Server,
also called Community Gateway, can be viewed as an
extended Community Network P2P Module that allows
non P2P clients and end users to access P2P community
network functions. This Community Gateway is provided
with a GUI and extended with a Community Network
Service (CNS) module. The CNS module is designed in a
SOA compliant way and provides non Community Net-
work enabled components and processes with a basic set
of interacting and interfacing functions with Community
Networks. The CNS also implements eventing and pro-
vides publish/subscribe functionalities for handling the
dynamics of shared multimedia content within communi-
ties.
Figure 1 summarizes the high level architecture of the
Community Network. The Community Network P2P
Modules and Functions depicted in this figure is com-
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
531
posed of networking and application level routing func-
tions relying on DHTs and on publish/subscribe para-
digms. It implements functions for initiating and inte-
grating a peer in a P2P community, as well as for pub-
lishing, distributing and sharing contents within a com-
munity. Pastry and Scribe have been selected to provide
content and metadata storage and retrieval [24,26]. Pastry
DHT, named PAST, is used as the Community Network
DHT. Scribe is used, on top of Pastry, to bu ild the group s
forming the P2P communities.
The community Gateway depicted in Figure 1 pro-
vides a generic community interaction and management
interface, as well as content sharing and delivery to/from
the P2P communities. Its main building blocks are the
following:
Community Network (CN) P2P Modules and Func-
tions.
Community Network Service.
Community Gateway GUI: user-friendly graphical in-
terface for end users of P2P communities, i.e. the
community members.
Search Aggregation Client: client of the Search Ag-
gregation service that is designed and implemented,
as a Web Service, for allowing the search of multi-
media content over the Internet through a natural lan-
guage and keyword-based query system. The search
result is a description of a content that matches the
search attributes, given in the CAM4Home Metadata
model description format [22]. The Search Aggrega-
tion Client is also introduced for populating Commu-
nity Networks with external content. For example,
when a search operation performed by a community
member fails, the Search Aggregation service could
be automatically invoked and any external content re-
trieved can be stored in the c ommunity and provide d to
memb e rs.
Metadata Management Client: introduced for allow-
ing the Community Network Server to validate Con-
tent Metadata descriptions that are published in our
Community Networks. Th ese C onten t des cripti ons need
to be validated prior to their publication within com-
munities in order to verify their conformance to the
CAM4home Metadata model.
2.2. P2P Content sharing and Delivery
Architecture
2.2.1. C onte x t and Rel ate d Work
There are several previous works on P2P architectures
and RDF data indexing and retrieval [26], some of them
are presented below:
Felber et al [27] have proposed an architecture to
access a file f using less specific queries. Every file f
is described by an XML document and characterized
by a most specific query q. The file f is stored in the
node respons ible for the key k = h(q) where h() is the
hashing function. The users can access the file by ge-
nerating queries close to q. Unfortunately this ap-
proach limits the search to a small number of appro-
priate queries.
Shen et al. [28] have adopted a super-node based P2P
architecture and developed the Building Hierarchical
Summaries” algorithm for representing the semantic
of a file by document’s summaries called Ddoc. All
Ddoc are held in a global group and peer index so that
the query is efficiently transmitted to the most rele-
vant peer. Unfortunately, this idea is only suitable to
Super peer architecture.
Zhu et al. [29] have proposed an approach to conduct
efficient semantic search on DHT overlays. Their ba-
sic idea was to place indexes of semantically close
files into the same peer nodes with high probability
Figure 1. Community Network high level archit ecture.
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
532
by adding major components on top of a DHT: ex-
tractor registry and semantic indexing and locating
utility. Each file is identified by a semantic identifier
“semID”. However, this approach produces identical
semID for similar files and queries.
Liarou et al [30] rely on evaluating conjunctive Triple
Pattern Queries over Large Structured Overlay Net-
works. They proposed the Query Chain algorithm (QC).
QC main characteristic is that the query is evaluated
by a chain of nod es. Inter mediate results flow throug h
the nodes of this chain and finally the last node in the
chain delivers the result back to the node that submit-
ted the query.
The distributed architecture that we have proposed and
implemented within CAM4Home project (described in 0)
aims at overcoming the weaknesses of the aforemen-
tioned approaches.
2.2.2. Proposed Architecture
We propose a distributed architecture based on semantic
metadata for indexing multimedia contents. Our archi-
tecture relies on structured P2P networks and allows se-
mantic queries on metadata. The structured P2P networks,
usually referred as Distributed Hash Tables (DHTs), of-
fers a lookup service similar to a hash table by providing
a mapping between a pair (key, value). The goal of the
proposed approach is to provide a fully distributed sys-
tem that exploits the scalability and efficiency of DHTs
in order to index and retrieve multimedia contents through
their semantic metadata. It is worth mentioning that this
scheme does not aim to build a distributed metadata re-
pository but to index the metadata by the RDF triples that
are distributed in the DHT-based storage for querying or
searching. In terms of implementation, we use a struc-
tured P2P system based on Pastry [23]. Pastry provides
self-organization, scalability and fault-tolerance. On top
of Pastry we carry out an application-level multicast in-
frastructure relying on Scribe [25]. Scribe is a top ic-based
publish/subscribe system. Any Scribe peer can create a
topic and other peers can subscribe to this new topic (i.e.
register their interest in the topic). Any Scribe peer can
publish events. These events are disseminated to all the
topic’s subscribers using Scribe dedicated functionalities.
Scribe uses the DHT of Pastry to manage topic creation
and subscription. For each topic, Scribe builds a multi-
cast tree over Pastry to disseminate the events published
in the topic. Each peer can act as an event publisher, a
root of a multicast tree, a subscriber to a topic, a peer
within a multicast tree, or any combination of th ese roles
[25]. This is summarized in Figure 2. The Pastry DHT,
named PAST, is used to distribute and share the multi-
media content among the peers of the communities.
3. Metadata Indexing and Searching Within
the P2P Community Network
Architecture
In this paper, we propose two approaches for indexing
and retrieve RDF metadata in a DHT. They are described
below.
3.1. Metadata Indexing and Searching: First
Approach
3.1.1. Metadata Indexing
In our first approach, we have chosen to index metadata
using configuration file. In this file, we precise the rele-
vant attributes that the system will use as keywords to
hash in the DHT. For each RDF metadata file, we:
Figure 2. Scribe/pastry architecture overview.
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
533
parse the file,
pick out the keywords to use in the DHT storage, ac-
cording to the configuration file,
store the relevant information about the owner peer
and the metadata file so it can be accessed from any
other peer in the overlay network.
This indexing algorithm is mapped in Table 1.
Let f(a, n) be the cost of running the algorithm, where
a is the number of attributes in the configuration file, and
n is the number of nodes in the DHT.
We have:
f(a, n) =
0
(1log( ))
a
k
n
=
+
f(a, n) = a + a.log(n)
f(a, n) = a.(1 + log(n))
f(a, n) = O(a.log(n))
Therefore, the Execution Time of our first algorith m is
in O(a.log(n)).
3.1.2. Semantic Searching
Since a description file is stored according to the hash
values of some keywords, the user is able to retrieve data
by querying the desired ones. Search operation uses the
DHT network to lookup metadata and to access to related
data.
In order to search content and retrieve its RDF file
metadata, the system:
Searches metadata available in DHT that is related to
user’s selected keywords,
Finds the physical location of the metadata,
Retrieves the physical location of the Metadata and
the data (image, video, text),
And finally displays it on the remote Community Net-
work Client Interface.
These operations are depicted in Figure 3 which
summarizes all the search mechanism steps. They are the
following:
Condition: Peer A publishes his content (a link to a
CAM Element). This content will be routed to the re-
sponsible peer whose Id is closest to the file’s iden-
tifier (Peer B).
Another peer (Peer C) performs a search operation,
Table 1. Algorithm of Indexing Metadata using configura-
tion files.
Precondition: Peer disposing of configuration files
1. for all attributes in the configuration file
2. {
3. id = hash(O);
4. put(id, Information);
5. }
Figure 3. Ste ps of Sear c h Metadata (first scenario).
The peer responsible for the file Id (Peer B) returns
the Metadata file URL,
Peer C access to the Metadata file stored in Peer A
using the file Id del ivered by P e e r B.,
The Metadata file contains all the information related
to a specific resource. So, Peer C can extract
The physical location from the Metadata file. The data
is now reachable by Peer C.
In a traditional DHT, peers reference the multimedia
content by its name. So, only users knowing exactly what
they are looking for are likely to acc es s it. In that context,
our first investigation was to index the multimedia con-
tent according to some relevant attributes of the RDF
metadata file. By the way, users are more likely to dis-
cover the RDF metadata description of the multimedia
content which becomes easily reachable. However, the
remote access to the metadata files increases the response
time within the community Network. For this reason, we
propose a second approach which consists of storing the
whole RDF description in the DHT. In this way, we not
only enhance the response time, but also the semantic
aspects. In fact, since the entire RDF description file is
stored in the DHT, the user can retrieve it from any of its
attributes. The following section will present in detail this
second approach.
3.2. Metadata Indexing and Searching: Second
Scenario
3.2.1. Metadata Indexing and DHT Storage
The proposed metadata indexing scheme is composed of
two steps. The first step carries out an RDF triple-based
iterative indexing algorithm for multimedia addressing
and discovery, thus providing a mapping for multimedia
metadata spread among the distributed peers. This step is
characterized by a hierarchical scheme in which the me-
tadata are organized through RDF triple. The second step
carries out an RDF triple-based storing scheme for me-
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
534
tadata querying and searching.
An RDF document can be decomposed into a hie-
rarchy set of atomic RDF statements. However, there is a
discrepancy between metadata representation and storage
model in a DHT-based infrastructure. The metadata is
serialized as a hierarchical structured RDF/XML, while
the data storage is codified in a flat pair of (key, value).
To fill up this gap, we have proposed a RDF triple-based
hierarchical indexing scheme. By using hierarchical triples,
the CAM metadata can then be addressed iteratively and
recursively. Let us take a CAM Object as an example.
This object is decomposed into the triple tree T00; T10;
T11; T12; T13; T14; T20; T30, as shown in Figure 4.
In order to index metadata, we iteratively put into the
DHT overlay a set of (key, value) pair in the form (idmn,
idij). Here, idij is generated by hashing the triple Tij, while
idmn is done by hashing each descendant of Tij. The me-
tadata is identified by idCAM, calculated by the hash func-
tion on the metadata itself. Th e entity idCAM is associated
with T00 by (id00, idCAM). In order to support efficient
queries on distributed RDF triples, we exploit an overlay
structure to build a distributed index for these triples.
Given the generic RDF triple Tij with the form Tij = (Sij,
Pij, Oij), it is registered into DHT three times by hashing
Sij, Pij and Oij respectively. In that way, triple Tij can be
queried through Sij, Pij or Oij. Since the value of attribute
subject” and predicate” must be a URI which is a string,
the hashing function of DHT is directly used to map the
values of subject and predicate to the m-bit identifier
space. For object”, its type can be URI and literal. In
case of a String type, the same hashing function is ap-
plied, while the locality preserving hashing used in [31]
is applied to numeric types. The whole indexing algo-
rithm is detailed in Table 2.
Figure 4. RDF triple-based tree Structure of CAM object.
Table 2. Algorithm of Indexing metadata by RDF Triples.
Precondition: RDF Metadata file has been decomposed into the triple tree like in Figure 1.
1: for all triples in the tree
2: {
3: idij = hash(Tij);
4: do
5: put(idmn, idij);
6: while Tij has a descendant triple Tmn
7: }
8: idCAM = hash(CAM); /* Hash the metadata itself */
9: put(id00, idCAM);
10: put( idCAM, CAM);
11: for all triples in the tree /* Deposit triples by Subject, Predicate and Object */
12: {
13: idij(S) = hash(S); put(idij(S), Tij);
14: idij(P) = hash(P); put(idij(P), Tij);
15: idij(O) = hash(O); put(idij(O), Tij);
16: }
CAM Object
Metadata
T00
(C4H#O;4a07;0, rdf:type,
core:DocumentElementMetadata)
T11
(C4H#O;4a07;0,
core:creationDateTime, "2009-06-
29T09:16:28Z"^^xsd:datetime)
T10
(C4H#O;4a07;0,
core:camElementID,
"4a07"^^xsd:string)
T13
(C4H#O;4a07;0, core:title, "TIME
Special Commemorative Edition
Michael Jackson"^^xsd:string )
T12
(C4H#O;4a07;0,
core:isMetadataOf,
genid:A23214)
T20
(genid:A23214,
core:DocumentElement,
genid:A23215)
T30
(genid:A23215,
core:essenceFileIdentifier,
"ed2k://..."^^xsd:anyURI )
T14
(C4H#O;4a07;0, core:description,
"TIME ... on Monday, June
29."^^xsd:string )
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
535
The decomposition of metadata into triples is very fast.
In fact, the execution time of this algorithm is logarith-
mic. Let f(n) be this execution time, where n is the num-
ber of nodes in the DHT.
Let m be the number of triples in the RDF file.
We have:
f(n) = m.log(n) + log(n) + log(n) + 3.m.log(n)
f(n) = log(n).(m + 1 + 1 + 3.m) = 4.m.log(n)
Since m << n, we have finally the following execution
time:
f(n) = O(log(n))
3.2.2. Semantic Searching
Metadata search consists in locating CAM Element given
one or more of their attributes, while metadata query cor-
responds to the extraction of some required information
from metadata. We have focused on an approach that ex-
tends P2P community system with searching capabilities
by exploiting both the triple-based indexing algorithm and
the knowledge-based RDF query languages. Based on se-
mantic query languages like SPARQL and the triple-
based indexing scheme, searching multimedia in the pro-
posed system can be easily achieved as follows: since
RDF triples are distributed on P2P DHT overlay, we re-
solve RDF queries by the way as presented in [32]. As
querying results, RDF triples are formed to retrieve CAM
Bundle (or CAM Object) on which they have been in-
dexed. Table 1 presents the pseudo-codes of metadata
searching algorithm.
Let f(n, m, r) be the complexity of the a Search algo-
rithm, where n is the number of nodes in the DHT, m is
the number of triples in the RDF tree and r is the number
of results. The complexity of this algorithm is calculated
as follows:
f(n, m, r) =
11
(1log( )(1log( ))
rm
kl
nn
= =
+ ++
∑∑
f(n, m, r) =
1
(1log( )(1log( ))
r
k
nm n
=
+ ++
f(n, m, r) =
f(n, m, r) = r.(1 + m)(1 + log(n))
f(n, m, r) = O(r.m.log(n))
If we suppose that r<m then we finally have:
f(n, m, r) = f(n, m) = O(m log(n))
Common RDF query patterns are described in terms of
relationships. For instance, if the users wants to know
everything about a particular resource, it is S-Query in
the form of (S, P?, O?). Together with similar P-Query
and O-Query, they are called atomic queries. Complex
Table 3. Metadata Searching Algorithm.
Algorithm 2: CAM Bundle or Object Searching Algorithm
Precondition:
a) Metadata are indexed using Algorithm depicted in Table 2;
b) Search metadata with the social tag value “Mark Knopfler”
1: RDF Query
2: SELECT ?SocialTag WHERE {?SocialTag core:tagValue
“Mark Knopfler”^^xsd:string}
3: for all resulting triples
4: {
5:
key
=
hash
(
SocialTag
core:tagValue
“Mark
Knop-
fler”^^xsd:string);
6: Triple = get(key);
7: do
8: {
9: index = hash(Triple);
10: Triple = get(index);
11: } while Triple has the parent
12: CAM = get(Triple);
13: }
queries can be resolved based on atomic query. For ex-
ample, PO-Query (S?, P, O) is resolved by P-Query and
O-Query to filter the intersection triples. In the above
algorithms, RD F triples ind exed by Subject, Pred icate and
Object are stored three times in three peers. Through this
way, the knowledge in semantic metadata can be distri-
buted over DHTs, which makes any query be resolved.
However, it also results in hot spotwhere a specific
peer responsible for a popular index holds too many
triples. What’s more, the routing cost for resolving com-
plex query is multiple to that of atomic query. Thirdly,
we are not aiming to use the RDF triple-based scheme to
distribute metadata but to provide metadata search capa-
bilities based on RDF query. Consequently, information
in the index scheme should contain only the relevant
subset of data in order to create an efficient index struc-
ture. Essentially, the searching algorithm presented pre-
viously is for querying on leaf triples in O-Query or PO-
Query, and then recursively for getting the root triple in
order to retrieve the entire metadata. Therefore, we op-
timize to store the leaf triples without descendant by
hashing O and PO. Figure 5 summarizes the new steps
of the search mechanism.
Condition: Peer A publishes his content (the RDF me-
tadata file of the CAM Element). The content will be
routed to the responsible peer who Id is closest to the
file’s identifier (Peer B).
Another peer (Peer C) performs a search oper a tion.
Peer C performs a search operation by Subject S, Pre-
dicate P, Object O, (S, P), (P, O), (S, O) or (S, P, O),
The peer responsible for the file Id (Peer B) returns
the metadata content,
The metadata file contains all information related to a
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
536
Figure 5. Steps of Search Metadata (second scenario).
specific resource. So, the peer C can extract the physical
location from the metadata file. The data is now reacha-
ble by the peer C.
3.3. Incomplete Information Semantic Search
Peer-to-peer Dis tributed Hash Tab le (DHT) syste ms greatly
improve the scalability and exact-match accuracy of P2P
systems. Those systems make it simple to discover spe-
cific data when their complete identifiers-or keys-are
known in advance.
In practice, however, users looking up resources don’t
know the exact match identifiers stored in P2P system.
Since our proposed approaches of RDF descriptions in-
dexation and search are based on DHT storage, this DHT
shortcoming is still a problem. Indeed, although the me-
tadata file can be localized based on one of its attributes,
user still need to introduce the correct sentence . The goal
of this approach is to provide a distribute system which
can make a mapping between user’s incomplete informa-
tion and keywords to resolve the problem of the search
by exact keywords. This approach introduces modifica-
tions in the indexation and in the search of data. Those
modifications are presented below.
3.3.1. Indexation
In our proposed indexation approach, we include a me-
chanism to do the spelling which is based on distributed
indexes build when storing the metadata in the DHT. In
fact we include two additional indexes:
The first index is indexing the relevant keywords
stored in the DHT. Those keywords correspond to the
object of each triple (S, P, O) in the RDF file descrip-
tion. This index contains the exact match keywords
that a user is able to need for his search.
The second index is introduced because the exact
match keywords could be single words or whole sen-
tences. However, to do the spelling, we need to verify
if each word is well spelled. So, we create a second
index regrouping the exact match keywords split into
words.
3.3.2. Search
When performing a search operation, a user could miss-
pell data or provide only partial information. In that case,
we have included a mechanism to correct and complete
user’s keywords.
Before searching in the DHT, we make sure that we
are using a keyword already stored. In fact, we perform
the following steps:
1) The system corrects the user’ keywords, if they are
misspelled, and finds the corresponding exact match key-
words stored in the DHT. In fact, we make sure that the
user have spelled correctly a keyword or a part of key-
word stored in the DHT (we will use the first ind ex). The
system returns all the possible words closest to the user’s
proposition.
2) Find the exact match keywords stored in the DHT
(using the second index). Find the whole keyword that
contains a part of user’s proposition.
3) Search in the DHT and find identifiers of the CA-
MElements stored in the system.
This approach is incorporated in the search operation
to improve the flexibility of our system. We have intro-
duced a spell checking step into the search algorithm
(according to the indexes inserted into the indexation
stage) for finding the exact keyword from partial infor-
mation, i.e. for finding the RDF metadata description
from incomplete or incorrect information.
As the second approach is semantically richer than the
first one, we will execute the semantic searching using
our SPO Algorithm with addition al step to suit the partial
or misspelled words to the correct one as illustrated in
Figure 6.
Figure 6. Steps of Search Metadata (second scenario).
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
537
4. Implementation
The implemented P2P Community Network is depicted
in Figure 7, in cluding the Community Network GUI. The
main components of this architecture are the Community
Network Peer, the Community Network Service and the
Community Network GUI. The community Network Serv-
er is a specific Community Network Peer allowing non
P2P-enabled entities to interact with P2P communities
through the Community network architecture. It acts as a
community Gateway and implements additional functio-
nalities for serving non P2P clients. These components
are fully specified in [33] and are therefore only su mma-
rized below, mainly in terms of implemented functional i-
ties. All the aforementioned mechanisms of indexing and
searching (see paragraph 3) have been implemented and
tested in real conditions on the large scale demonstrator
of the CAM4Home project [21]. The P2P Community
Network implementation also relies on AJAX and AJAX-
Push/Comet framework [34] and ICEfaces Framework
[36].
The Community Network Server, also called P2P Ga-
teway, enables a non-P2P Client to interact with P2P
Community networks for creating or joining communi-
ties, publishing and discovering bundles (i.e. aggregated
multimedia content) in the P2P network. The developed
Server takes into account the heterogeneity of multime-
dia content (audio, video, text) and includes a publish/
subscribe model that notifies users about bundle elements’
changes. The functions provided by the Community Net-
work Peer and the community Network Server are:
Figure 7. Community Network and GUI: overall architecture.
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
538
Create Node
To Create a Scribe Node, we use the Pastry Create
Node functionality and message. The address and the
port of an existing node (the bootstrap node) have to be
specified as inputs, and the system returns the Node Id of
the new node. This Node Id is automatically generated
by the P2P system. The created node can also be used
afterwards as a bootstrap node.
Create community
Our architecture offers to users the possibility of creat-
ing communities of interests. A P2P user can create a
community with has a significant name called communi-
ty topic. This would allow other users interested in that
topic to subscribe in the corresponding community and
share data.
To create a community called Community Name, a
P2P node sends a CREATE (Community Name) message
to the Scribe system. The message is received by the
node having the closet ID to Hash (Community Name)
(Hash beeing the hashing function). This node is called
the Rendezvous node of the said community.
Join Community
When a user is interested in a given community, she/
he can subscribe and join it. The user would be able to
receive the notification messages from community mem-
bers. A P2P client has to perform a join operation to
subscribe to the chosen community.
For this purpose, scribe node sends a JOIN message to
the Group Id corresponding to the community it wants to
join. Pastry routes this message to the rendezvous point
using the forward method. There are two distinct situa-
tions: if the intermediate node is already a forward node,
add this node as a child, if the intermediate node is not a
forward node, create a child table of the group and add
the node.
Leave Community
When a user is no more interested by a community, he
can leave it and triggers at its Scribe peer a subscription
removal. When a Scribe node wants to perform a sub-
scription removal from a peer group, it records locally
that it left the group. Then, it checks if th ere are no other
entries in its children table. If so, it sends a LEAVE
message to its parents in its multicast tree. The message
then travels recursively up the multicast tree until it
reaches a node that has no more knowledge, in its routing
table, of the node be i ng rem ove d .
Search Community
Every user connected to the system can perform a
search operation to discover the communities already
created in the system. The user can join an existing
community or choose to create a new one.
Publish Metadata
This function allows users to publish and share their
contents within a specific community. When publishing a
multimedia content within a community, the system will
execute all the aforementioned indexing algorithms for
storing its associated metadata information with d ifferent
hash ke ys within the DHT.
Search
The search functionality is implemented as already
described in Section 3.3: the search is performed using
semantic queries expressed in XML and is simplified by
the use of the indexation mechanism.
The Community Network Service is the Server-side
application allowing any component or application to
remotely interact with communities through the Commu-
nity Network Gateway. It is designed in a SOA compli-
ant way and implements Web Service components. For
modularity reason, it is not directly implemented with the
Community G a tewa y, b ut in teracts with th is latter throug h
an RPC interface and the P2P Module XML RPC Server
of the Community Gateway.
The Community Network GUIs, designed as modular
web interfaces using AJAX framework [35], are syn-
chronized in real time with Community Network GWs
through AJAX push events and synchronization tunnels
established between AJAX Push Servers of Community
Network GWs and AJAX-Push Clients of Community
Network GUIs. Let us recall that the synchronization
between Community Network Gateways is carried out
using the eventing mechanisms embedded in the Scribe/
Pastry P2P framework. When the Community Network
GUI is loading, a first web page is displayed to allow the
end user to enter is User Id and access credential. This
step is required for Authentication and Authorization
purposes since only registered and authorized user can
interact with Community Networks. If and only if the
authentication step succeeds, the user will be redirected
to the Community Network GUI main Web page. This
main Web page, depicted in Figure 8, is decomposed in
4 main sub-parts, showing available communities, end
user communities, end user local content, and retrieved
content.
Through the Web interface, an end user can:
Select an available community from the list of availa-
ble communities for joining purposes,
Create a community,
Leave one of its belonging communities,
Select one of its available contents and publish it in
one of its communities,
Search for a content in one or any of the existing
communities. The end user just enters a natural lan-
guage query in a dedicated field and click on the
Search button. All the retrieved content will be auto-
matically displayed,
Select content from the Search Content sub-part and
click on the Download button. This content is then
downloaded locally in a dedicated directory.
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
539
Figure 8. Community Network GUI Web page.
5. Evaluation
P2P systems are popular to be scalable and fault toler ant.
However, they lack of semantic which make search in
P2P systems (especially DHTs) difficult for users. Our
first method already introduces more semantic than com-
mon DHTs. However, this semantic is not so rich. We
have therefore carried out our second method that im-
plements a DHT based on enriched semantic. Only this
second approach will be evaluated in that section.
Our implemented architecture relies on Scribe and Pa-
stry. Those frameworks were already fully evaluated in
[24,37] where the authors have shown their large scale
scalability and efficiency. They will therefore not be ree-
valuated in this section where we will only validate our
architecture by analyzing the impact of introducing our
semantic storage/se arch mecha nisms on the sto rage/search
times and on the efficiency of our community network
architecture.
Our community Network architecture is designed to
handle complex multimedia contents which regroup sev-
eral multimedia objects (video, image, text…) in a single
bundle. Each bundle is described by an RDF documents
identifying all necessary elements to reconstruct a com-
plex multimedia content. By the way, we could find
many photos and videos characterizing the same topic or
event in barely 20 seconds as described in Figure 9. To
evaluate our architecture, we use well established meas-
ures to calculate the response time needed to restore mul-
tiple bundles from the DHT in the same time. We find
out that our system is accurate en ough to restore all bun-
dles related topic ba sed on a single mundane keyword or
less on an incomplete or misspelled keyword.
The search time needed for the system to get results
varies according to the keyword introduced by the users.
To have an accurate result, we calculate the mean value
of search times. For the same result, we calculate the
search time needed by the system to retrieve bundles
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
540
Figure 9. Evaluation of the occurrence of match.
based on different keywords. We have done the follow-
ing experiments:
We search all documents published in a specific com-
munity. We find out that the research time rises mod-
erately.
We search one or more documents related to a spe-
cific keyword.
We search one or more documents according to in-
complete or misspelled keywords. The research time
is more important because of the size of the index
used.
Moreover, the search time varies according to the
depth of the RDF tree metadata file describing the mul-
timedia bundle. The more the RDF file is deep, the more
the Bundle is rich. Figure 10 illustrates the variation of
search time depending on the tree depth, in other words,
depending on the complexity and richness of the multi-
media content.
In this work, we are interested in introd ucing a seman-
tic search mechanism to DHT. To have a flexible search,
we have added a mechanism to correct misspelled or
incomplete user keywords. We calculated the search time
to extract the same content based on exact match key-
words and on incomplete information. The slight extra
time needed to do the search, shown in Figure 11, de-
pends on the index size, which is bigger since users pub-
lish lots of contents. Moreov er, as soon as the user intro-
duces a wrong word, the correction mechanism will gen-
erate several possibilities of exact match keywords. This
will generate in turn different combinations to recover
the RDF metadata file from the RDF triple-based tree.
However, when we choose the exact match keywords
of a single specific content, the research time is constant
whatever the number of metadata content stored in the
DHT is and it increases moderately when the keywords
only belongs to several metadata files as illustrated in
Figu re 12. The yellow curve corresponds to the research
times of a single content using the exact match keyword,
whereas the green one corresponds to the research times
of all the contents using the wildcard search attribute.
Moreover, the relevance of the answers is an important
evaluation criterion. Thus, in our implemented architec-
ture, the user could choose the precision of correcting
words he wants (from 0.1 to 0.9 and 1). This precision
influences the accuracy of results but not the response
time. A good tradeoff between search flexibility and re-
sponse time can be obtained by fixing the precision to 0.7.
The precision of 1 corresponds to an exact match search.
In conclusion, introducing metadata and semantic as-
pects in the DHT ease the user’s search, since the multi-
media content is reachable from any mix of complete/
incomplete information specified in its metadata f ile.
6. Conclusions
In this paper, we have presented and evaluated an origi-
nal semantic-based community network architecture for
heterogeneous multimedia content wide area sharing and
retrieval. The use and the storage of CAM4Home seman-
tic Metadata as content descriptions within the P2P Com-
munity (distributed in a semantic DHT) ensure that all
members shared a common understanding of content.
We have enhanced the P2P communities with novel
indexing techniques for indexing the data stored in the
peer-to-peer network (indexes are distributed across the
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
541
Figure 10. Search Time e valuation based on the RDF tree depth.
Figure 11. Additional search time for Misspelled keyword correction.
network and contain key-to-key or query-to-query map-
pings). Besides, the exact match lookup needed for search
in DHT has been overcome by the introduction and use
of two new indexes in the search mechanism. The com-
munity Network service design has also moved towards a
more semantic oriented framework based on Subjects/
Predicates/Objects search algo rithm.
The proposed mechanisms allow our P2P Community
to provide efficient multimedia content indexing (distri-
buted) and retrieval mechanisms, at peer and Community
Gateways level, thus offering full semantic distributed sto-
rage and search functionalities to users. The evaluations
and tests conducted and depicted in this paper show that
those mechanisms can be carried out in a scalable way.
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
542
Figure 12. Constant search time to find an exact match content.
REFERENCES
[1] K. Lua, J. Crowcroft, M. Pias, R. Sharma and S. Lim, “A
Survey and Comparison of Peer-to-Peer Overlay Network
Schemes,” IEEE Communications Surveys & Tutorials,
2005.
[2] M. Castro, M. Costa, A. Rowstron, Peer-to-Peer Over-
lays: Structured, Unstructured, or Both?Technical Re-
port MSR-TR-2004-73, Microsoft Research, Cambridge,
2004.
[3] M. Castro, M. Costaand and A. Rowstron, “Debunking
Some Myths about Structured and Unstructured Overlays,”
Proceedings of NSDI’05, Boston, May 2005.
[4] D. Stutzbach and R. Rejaie, “Characterizing Today’s
Gnutella Topology,” Proceedings of ACM SIGMETRICS
2005, Alberta Canada, June 2005.
[5] Y. R. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham
and S. Shenker, “Making Gnutella-Like p2p Systems
Scalable,Proceedings of SIGCOMM’03, Karlsruhe,
August 2003.
[6] Q. Lv, P. Cao, E. Cohen, K. Li and S. Shenker, Search
and Replication in Unstructured Peer-to-Peer Networks,”
Proceedings of ACM ICS 2002, New York City, June
2002.
[7] S. Saroiu, P. Gummadi, S. Gribble, et al., “A Measure-
ment Study of Peer-to-Peer File Sharing Systems,” Pro-
ceedings of ACM MMCN 2002, San Jose, California,
January 2002.
[8] P. Maymounkov and D. Mazières, Kademlia: A Peer-
to-Peer Information System Based on the Xor Metric,”
Proceedings of IPTPS 2002, Cambridge, March 2002, pp.
53-65.
[9] S. El-Ansary and S. Haridi: An Overview of Structured
P2P Overlay Networks,” Handbook on Theoretical and
Algorithmic Aspects of Sensor, Ad Hoc Wireless, and
Peer-to-Peer Networks, Aurebach Publications, 2006.
[10] M. Castro, M. Costa and A. Rowstron, “Performance and
Dependability of Structured Peer-to-Peer Overlays,”
Proceedings of IEEE DSN’04, Washington, 2004.
[11] S. Wang, D. Xuan and W. Zhao, “On Resilience of
Structured Peer-to-Peer Systems,” Proceedings of IEEE
Globecom, San Francisco, December 2003.
[12] Y. J. Joung, C. T. Fang and L. W. Yang, “Keyword
Search in dht-Based Peer-to-Peer Networks,” Proceed-
ings of IEEE ICDCS’05, 2005, pp. 339-348.
[13] C. Xie, G. Chen, A. Vandenberg and Y. Pan, Analysis of
Hybrid P2P Overlay Network Topology,” Computer
Communications, Elsevier, Vol. 31, No. 2, 2008, pp. 190-
200.
[14] H. Han, J. He and C. Zuo, “A Hybrid P2P Overlay Net-
work for High Efficient Search,” Proceedings of ICIFE
2010, Chongqing, September 2010, pp. 241-245
[15] Y.-F.u Yu, P.-J. Huang, Q.-J. Chen, T.-L. Huang and
K.-C. Lai, On the Design of Semi-structured Multi-Star
Hybrid-Overlays for Multi-attribute Range Queries—Ad-
vances in Grid and Pervasive Computing,” Computer
Science, Springer, Vol. 6104, 2010, pp. 451-460.
[16] A. Kementsietsidis, M. Arenas and R. J. Miller, “Map-
ping Data in Peer-to-Peer Systems: Semantics and Algo-
rithmic Issues,” Proceedings of ACM SIGMOD 2003, San
Diego, June 2003, pp. 325-336.
http://dx.doi.org/10.1145/872757.872798
[17] M. Cai, M. Frank, B. Pan and R. Mac Gregor, “A Sub-
scribable Peer-to-Peer rdf Repository for Distributed Me-
tadata Management. Journal of Web Semantics, Vol. 2,
I. FAKHFAKH ET AL.
Copyright © 2013 SciRes. CN
543
2005.
[18] A. Crespo and H. Garcia-Molina, Semantic Overlay
Networks for p2p Syste m s,” Technical Report, Stanford
University, January 2003.
[19] A. Mawlood-Yunis, M. Weiss and N. Santoro, “From
P2P to Reliable Semantic P2P Sys t em s,” Peer-to-Peer
Networking and Applications, Springer Journal, Vol. 3,
2010.
[20] T. Cerqueus, S. Cazalens and P. Lamarre, “Reducing the
Semantic Heterogeneity of Unstructured P2P Syst e ms : A
Contribution Based on a Dissemination Protocol,” Tran-
sactions on Large-Scale Data- and Knowledge-Systems
VII, Springer, Heidelberg, 2012, pp. 62-95.
[21] CAM4Home ITEA2 Project. www.cam4home-itea.org/
[22] CAM4Home Deli verable D2.1, “Metamodel Definition”.
http://www.w3.org/
[23] A. Rowstron and P. Druschel, Pastry: Scalable, Distri-
buted Object Location and Routing for Large-Scale Peer-
to-Peer Systems,” IFIP/ACM International Conference
on Distributed Systems Platforms (Middleware), Heidel-
berg, November 2001, pp. 329-350,
[24] M. Castro, P. Druschel, Y. C. Hu and A. Rowstron, “To-
pology-Aware Routing in Structured Peer-to-Peer Over-
lay,” Future Directions in Distributed Computing, Sprin-
ger-Verlag, 2003, pp. 103-107.
http://dx.doi.org/10.1007/3-540-37795-6_19
[25] A. Rowstron, A. M. Kermarrec, M. Castro and P. Drus-
chel, “Scribe: The Design of a Large-Scale Event Notifi-
cation Infrastructure,” In: J. Crowcroft and M. Hofmann,
Eds., Networked Group Communication, 3rd Internation-
al COST264 Workshop (NGC’2001) (Nov. 2001), Vol.
2233, pp. 30-43.
[26] I. Filali, F. Bongiovanni, F. Huet and F. Baude “RDF
Data Indexing and Retrieval: A Survey of Peer-to-Peer
Based Solutions,” INRIA, 00540314, Sophia Antipolis,
Novembre 2010.
[27] P. A. Felber, L. Garcés-Erice, E. W. Biersack and G.
Urvoy-Keller, “Data Indexing in Peer-to-Peer DHT Net-
works,” Institute EURECOM, 06904 Sophia Antipolis,
France.
[28] H. T. Shen, Y. F. Shu and B. Yu, “Efficient Seman-
tic-based Content Search in P2P Network,” IEEE Trans-
actions on Knowledge and Data Engineering, Vol. 16, No.
7, July 2004.
[29] J. Y. Zhu and Y. Hu , “Efficient Semantic Search on DHT
Overlays,” Journal of Parallel and Distributed Compu-
ting, Vol. 67, No. 5, 2007, pp. 604-616.
http://dx.doi.org/10.1016/j.jpdc.2007.01.005
[30] E. Liarou, S. Idreos and M. Koubarakis, “Evaluating
Conjunctive Triple Pattern Queries over Large Structured
Overlay Networks,” Proceedings of 5th International
Semantic Web Conference, Athens, November 2006.
[31] M. Cai and M. Frank, “MAAN: A Multi-Attribute Ad-
dressable Network for Grid Information Services,” Pro-
ceeding of 4th International Workshop on Peer-to-Per
Systems, February 2003.
[32] M. Cai and M. Frank, “RDFPeers: A Scalable Distributed
RDF Repository Based on A Structured Peer-to-Peer
Network,” Proceedings of the 13th International Confe-
rence on World Wide Web, 2004, pp. 650-657.
http://dx.doi.org/10.1145/988672.988760
[33] Cam4Home Deliverable D4.32, “Community Network
and Services”.
[34] A. Mesbah and A. van Deursen, “A Component and
Push-Based Architectural Style for AJAX Applications,”
Journal of Systems and Software, Vol. 81, pp. 2194-2209.
[35] ICEfaces, “An Open Source Ajax Framework that En-
ables Java EE Application Developers to Create and
Deploy Server-Based Rich Internet Application (RIA)
Using the Java Language”
http://www.icefaces.org/main/home/.
[36] M. Castro, P. Druschel and A. M. Kermarrec and A.
Rowstron, “SCRIBE: A Large-Scale and Decentralised
Application-Level Multicast Infrastructure,” IEEE Jour-
nal on Selected Areas in Communication (JSAC), Vol. 20,
No. 8, October 2002.