Participatory sensing systems are designed to enable community people to collect, analyze, and share information for their mutual benefit in a cost-effective way. The apparently insensitive information transmitted in plaintext through the inexpensive infrastructure can be used by an eavesdrop-per to infer some sensitive information and threaten the privacy of the partic-ipating users. Participation of users cannot be ensured without assuring the privacy of the participants. Existing techniques add some uncertainty to the actual observation to achieve anonymity which, however, diminishes data quality/utility to an unacceptable extent. The subset-coding based anonymiza-tion technique, DGAS [LCN 16] provides the desired level of privacy. In this research, our objective is to overcome this limitation and design a scheme with broader applicability. We have developed a computationally efficient sub-set-coding scheme and also present a multi-dimensional anonymization tech-nique that anonymizes multiple properties of user observation, e.g. both loca-tion and product association of an observer in the context of consumer price sharing application. To the best of our knowledge, it is the first work which supports multi-dimensional anonymization in PSS. This paper also presents an in-depth analysis of adversary threats considering collusion of adversaries and different report interception patterns. Theoretical analysis, comprehensive simulation, and Android prototype based experiments are carried out to estab-lish the applicability of the proposed scheme. Also, the adversary capability is simulated to prove our scheme’s effectiveness against privacy risk.
Participatory Sensing System (PSS) is a framework facilitating community members sense, collect, analyze, and share information obtained from the surroundings for mutual benefit. This has evolved as a cost-effective alternative for reliable and impartial data collection, processing and dissemination. Smartphones equipped with high-precision localization capability and camera or other ad-hoc sensing devices mounted on vehicles may be used to record objects/events of interest by the participants. These captured data are then reported to the Application Server (ApS) using existing lightweight wireless communication networks. ApS is expected to extract valuable information from a collection of reports to repay the participants by responding to specific queries on-demand. Consumer price information sharing applications [
Privacy preservation of the participants is a pre-requisite for the success of PSS. An eavesdropper may infer sensitive information about an observer by intercepting some reports. Hiding the data ownership is not an option in this context as it infringes the reputation schemes needed by ApS to assess the reliability and trustworthiness of data and also for developing incentive mechanisms [
The existing privacy protection mechanisms, where information is transmitted with some anonymity or by adding Gaussian noise or at reduced precision, cannot be used here as the destination expects complete data recoverability at the individual level. For example, PetrolWatch [
Our previous k-anonymization techniques proposed in [
To the best of our knowledge, this is the first work that attempts simultaneously anonymizing multiple attributes of an observation. The proposed OC-based scheme is developed first for single-dimension anonymization and then extended to the multi-dimensional scenarios. The paper also presents comprehensive analysis on privacy risk by the adversaries. By considering colluding adversary models and different message interception patterns of the adversaries, the analysis confirms the robustness of the proposed k-anonymization scheme against a wide range of malicious attacks. The specific contributions of this paper are listed below:
• Developing the first Multi-dimensional Anonymization Scheme (MDEAS) of PSS that provides anonymity in multiple dimensions.
• Designing efficient anonymization and de-anonymization algorithms that preserve high data recoverability at the desired end and exploiting some optimization issues.
• Designing k-anonymization technique that works with variable k, i.e. different user preference of anonymity. This is useful to design incentive schemes considering users’ choice of privacy, i.e. offering more incentive to a user who demands less anonymity.
• Presenting theoretical analysis on desirable properties of MDEAS and validating those with extensive simulation.
• Comprehensively analyzing privacy risk with simulation in the presence of different types of adversaries.
• Conducting real-world experiments with Android-based prototype.
The organization of the rest of the paper is as follows. Section 2 describes the system model of PSS, its related terminologies, and the adversary model. Section 3 explains our proposed scheme with a detailed example. Section 4 presents the optimizations that can be applied in MDEAS with an example. We also present a theoretical analysis on required number of reports to achieve full de-anonymization in Section 5. We present the Algorithm and its computational complexity in Section 6. In Section 7, we present our simulation setup and the results of experiments. In Section 8, we discuss the previously proposed privacy schemes on PSS along with their limitations for real world applications. Finally, Section 9 concludes the paper.
In PSS, users (mobile nodes) report their observations about some objects or events to an Application Server (ApS). The ApS wants to collect information about some particular objects/events (e.g. price of fuel). We denote this domain as PSS scenario. We use the term Objects of Interest to represent the objects/events that a PSS scenario is interested about.
Definition 1 (of Interest). An Object of Interest (OOI) is an object/event whose attribute/property is observed and reported by the participants of a PSS application.
Here we discuss the major entities of a PSS scenario in brief. The users are independent and they do not collectively send reports and there is no apparent communication between them. To protect the privacy of a user, several anonymization schemes are applied in PSS that use an Anonymization Server (AnS) [
1) Anonymization Step: User reports the observed data to Anonymization Server (AnS) that transforms and returns an anonymized report (AR) to her. Since incentive or reputation schemes do not rely on this communication, user association with these reports need not be preserved. This reporting is done through mix-network based communication [
2) Reporting Step: User sends AR to ApS along with his/her identity information. ApS de-anonymizes these reports to map an attribute to an OOI for offering the desired service to the participants. This reporting is done through conventional, inexpensive and un-encrypted communication network.
Note that the primary purpose of using AnS is to reduce the number of required observations to map all the OOIs to their corresponding attributes by the ApS. The report sent to the AnS for anonymization contains user preference of anonymity (denoted as k), OOI identification, and the observed value/attribute. For example, a participant John observes the price of a camera. He wants to report camera’s price with 3-anonymity i.e. k = 3 . So he sends the report 〈 Camera { 3 } 〉 : $ 100 to the AnS. Now, AnS may anonymize his report as 〈 Camera , Phone , GPS 〉 : $ 100 and returns this AR to John. Next, John sends this AR to the ApS with his identity.
As alluded in the previous section, users’ association with multiple objects/events can also be protected simultaneously in our proposed scheme. Let John reports the price of a camera on a particular location loc1 and wants to anonymize his report both in terms of location and product simultaneously. In many cases, this is essential as the price of the product varies with location. Let us use N and S to denote the total number of OOIs and the set of all OOIs, respectively in a single-dimensional PSS. For d-dimensional PSS scenario, let the total number of OOIs for all dimensions be denoted as N 1 , N 2 , ⋯ , N d and their respective sets of OOIs as S 1 , S 2 , ⋯ , S d . Even the anonymity preference for each dimension can be different. We use k 1 , k 2 , ⋯ , k d to denote the anonymity preference. The term OOI Combination is used to denote the collection of d OOIs from d dimensions for which an attribute is reported. Accordingly, the total number of OOI combinations is X = ∏ i = 1 d N i . Suppose, John reports his observed data to the AnS as 〈 Camera { 2 } , l o c 1 { 3 } 〉 : $ 100 as shown in
Definition 2 (Report). An Anonymized Report (AR) for an observed report 〈 O O I i 1 { k i } , O O I j 1 { k j } , ⋯ , O O I d 1 { k d } 〉 : v is expressed as 〈 { O O I i 1 } ∪ { O O I i 2 } ∪ ⋯ ∪ { O O I i k i } , { O O I j 1 } ∪ { O O I j 2 ∪ ⋯ ∪ O O I j k j } , ⋯ , { O O I d 1 } ∪ { O O I d 2 ∪ ⋯ ∪ O O I d k d } 〉 > : a such that each O O I i j ∈ S i .
Hence, the task of anonymization is basically to select some extra OOIs from the relevant available alternatives along with the real OOI according to the user’s preference of anonymity. A good anonymization algorithm should select the extra OOIs in such a way that ApS can de-anonymize them with few ARs.
Data quality is achieved when the ARs of a PSS scenario are fully de-anonymized. Here, we define the term Full De-anonymization as follows:
Definition 3 (De-anonymization). The outcome of an anonymization technique achieves full de-anonymization iff N OOIs (single-dimensional scenario) or X OOI combinations (multi-dimensional scenario) can be associated with their correct attributes.
Another desired property of an anonymization technique in our context is to achieve full de-anonymization from a feasibly low number of anonymized reports. To measure this property, we define the term NRRFD as follows:
Definition 4 (NRRFD). NRRFD (Number of Reports Required for Full Deanonymization) refers to the total number of ARs required to achieve full deanonymization in a particular PSS scenario.
The NRRFD depends on the order of appearance of ARs in PSS. However, anonymization techniques anonymize intelligently to keep the NRRFD minimal.
In our model, we assume that each OOI has a unique attribute; which may not be practical in some scenarios. However, the transformation of the non-unique scenario to the unique scenario can be accomplished by the AnS which can make the attribute unique by adding a small value below the level of significance when it receives the same attribute for different OOIs.
The adversaries of PSS are assumed to be rational, i.e. does not attack the operation of the system. Rather they try to eavesdrop messages and reveal users’ private information. As the AR sent to ApS includes participator’s identity, the adversary residing near ApS is the strongest one (see
The primary strategy against adversary is to divide the anonymization tasks of a PSS among different AnSs as presented in [
Note that the user registration for a particular group of OOIs is done once in a while. Hence, it can be done via a secure website (e.g. HTTPS). Thus group
association of user is not revealed to the adversary but known to ApS which can de-anonymize ARs according to their groups. However, our previous works [
We divide adversaries into three types depending on how they intercept the ARs of a PSS scenario as shown in
Besides this interception pattern, we also consider some enhanced capabilities of adversaries as follows.
Each adversary intercepts certain ARs and tries to de-anonymize those. However, the threat becomes stronger if multiple adversaries share information among them. As Type 2 and Type 3 adversaries cannot intercept all ARs, they can share their intercepted ARs among themselves and become stronger. Moreover, all types of adversaries can also share their own observed OOI-attribute mapping among themselves and de-anonymize attributes with the combined information.
As adversaries cannot distinguish reports of different PSS groups, they cannot reveal the real OOI from their de-anonymization result. Even if we assume that the adversaries know the group mapping by registering in all groups or by collusion with members of other groups, it is still not possible to reveal the real OOI from the reported OOI name as adversaries do not know the users’ group id. This argument is validated with simulation results.
Let adversaries make some prediction based on distance estimation to predict the real location (OOI) from the OOIs of different groups with same local ID, an equidistant location may be considered as a representative point of all these OOIs. The OOI nearest from this representative point can be considered as real OOI by adversary. We shall also empirically investigate if this strategy enables the adversary to cause more risk.
Type of Adversary | Interception Pattern |
---|---|
Type 1 | All ARs for whole period |
Type 2 | All ARs for a limited period |
Type 3 | Random proportion of ARs for whole period |
We now discuss the conceptual framework with an illustration of examples of our proposed scheme, MDEAS (Multi-dimensional Effective Anonymization Scheme). MDEAS works efficiently and supports variable-length anonymity. Instead of keeping all possible combinations of attribute-OOI mapping, MDEAS keeps track of occurrence counts or absence counts for each reported attribute. When a user sends an actual report to the AnS, it tries to anonymize each observed report in such a way that ApS can de-anonymize maximum attributes. The whole scheme can be divided into two parts, i.e. Anonymization and Deanonymization. For the convenience of the readers, we first discuss the simple boundary case of the technique considering single dimension and then explain its expansion to multiple dimensions.
In order to explain the concept, we assume a PSS scenario of consumer price sharing with N = | S | = 4 where it collects the price of four different products named A, B, C, and D that have prices $10, $20, $30, and $40, respectively. We have assumed these values of the parameters by analyzing real world application scenarios. For the sake of simplicity, we assume unique attributes. We assume the order of appearance of observations as shown in
The goal of anonymization process is to generate AR by selecting k − 1 additional OOIs along with the observed OOI in such a way that the joint de-
anonymization can be done with a feasibly low number of ARs. Without any control over the distribution or order of observations, it cannot be done optimally. However, the AnS uses some heuristics such as maximizing the diversity of an AR with respect to previously generated ARs for same OOI. The AnS maintains a data structure named Inverse Occurrence Checklist (IOC) for N OOIs. The IOC for an OOI p, denoted as I O C p , contains the absence count of each other OOI q | q ∈ S ∧ q ≠ p where S is the set of OOIs. We use the notion I O C p ( q ) to express how many times q could be, but has not been included in ARs of OOI p.
The rule for identifying whether an OOI might be de-anonymized is as follows:
Rule 1. (Rule of being de-anonymizable) An OOI p is de-anonymizable if
∀ q ∈ S ∧ q ≠ p I O C p ( q ) > 0 (1)
The rule for selecting k − 1 OOIs is as follows:
Rule 2. (Rule for selection of OOIs in an AR) To anonymize a report containing the attribute of OOI p, the set of selected OOIs, S ′ is formed as:
S ′ = p ∪ ( k − 1 ) largest IOC-valued OOIs in ( S \ p ) (2)
After anonymizing the report, AnS increases the count of those OOIs in S that have not been included in this AR. This rule of updating IOC values can be formalized as:
Rule 3. (Rule of updating IOC) After producing an AR for OOI p, the I O C p is updated as
∀ q ∈ S ∧ q ≠ p ∧ q ∉ S ′ I O C p ( q ) = I O C p ( q ) + 1 (3)
All IOC values are initialized with zero (
In this state, for OOI A, all the IOC values, i.e. I O C A ( B ) , I O C A ( C ) , I O C A ( D ) > 0 . Hence, according to Rule 1, A is de-anonymizable. In the same way, subsequent observations are de-anonymized. After the arrival of the ninth report, all the OOIs are de-anonymizable and IOC table reaches the state shown in
This process runs in ApS. As ApS may not have prior knowledge of all OOIs, it cannot construct fixed length IOC table. Therefore, ApS maintains another data structure named OC (Occurrence Checklist) for each reported attribute. OC for a reported attribute v, denoted as O C v , tracks the occurrence count of the candidate OOIs for v. We also use the notation O C v ( p ) to denote how many times p has been reported as candidate OOIs in all reports of attribute v. Besides O C v , ApS tracks the total number of reports for each reported attribute v denoted as T v . When an AR is received by the ApS, it follows the steps below:
1) Creates O C v if v is reported for the first time to ApS. (all values initialized to zero).
2) Sets T v = T v + 1 .
3) Updates O C v ( p ) as
∀ p ∈ OOIs of AR O C v ( p ) = O C v ( p ) + 1
An attribute v is mapped to an OOI p i.e. p is de-anonymized by ApS, if the following rule is satisfied for O C v .
Rule 4. (Rule of being de-anonymized) The attribute v is de-anonymized for OOI p if
O C v ( p ) = T v ∧ ∀ q ∈ S ∧ q ≠ p ∧ q ∉ S D O C p ( q ) < T v (4)
Considering our example, ApS receives the AR, 〈 A , B , C 〉 : $ 10 first. Since $10 has not been reported before, ApS creates an OC for $10 denoted as O C $ 10 . As it is the first report of $10, ApS sets T $ 10 to one. This report indicates that $10 is a possible attribute of either A, B or C. Hence, ApS creates three OC columns for A, B and C denoted as O C $ 10 ( A ) , O C $ 10 ( B ) and O C $ 10 ( C ) respectively and increases their OC values as shown in
The second AR received by ApS is 〈 A , B , D 〉 : $ 10 . As $10 has been reported before, ApS does not need to create O C $ 10 again. However, the OOI D has been reported to ApS for the first time in this report. Hence, ApS creates an additional column for OOI D. Next, ApS increases the OC values of the candidate OOIs of this report, i.e. O C $ 10 ( A ) , O C $ 10 ( B ) and O C $ 10 ( D ) by one (
O C $ 10 ( A ) = T $ 10 = 3 and all other O C s, i . e . , O C $ 10 ( B ) = O C $ 10 ( C ) = O C $ 10 ( D ) < T $ 10
Therefore, according to Rule 4, ApS can de-anonymize the attribute $10 as only the O C v value of OOI A is equal to T $ 10 .
In the same process, B, C and D are de-anonymized by ApS after receiving fourth to ninth ARs (
For multi-dimensional PSS scenario, we apply the same rules of anonymization and de-anonymization explained in the previous section for each dimension.
For the sake of simplicity, we discuss this process by restricting our example scenario in two dimensions, i.e. d = 2 . Accordingly, we assume a PSS application which deals with the price of 3 products, e.g. S 1 = { A , B , C } in three different locations, e.g. S 2 = { X , Y , Z } . Hence, the total number of OOI combinations is 3 × 3 = 9 and their set is R = { ( A , X ) , ( A , Y ) , ⋯ , ( C , Y ) , ( C , Z ) } . The observed attributes for each OOI combination are shown in
For each dimension i, AnS chooses k i − 1 OOIs along with the real OOI where k i is user’s preference of anonymity for ith dimension. AnS maintains d different IOCs for each OOI combination r ∈ R . In our example, the first report 〈 A { 2 } , X { 2 } 〉 : $ 11 refers to the price of A from location X. To anonymize these two OOIs, i.e. product and location, AnS randomly chooses additional OOIs B (for product) and Y (for location), respectively at the initial step. After producing this AR, i.e. 〈 { A , B } , { X , Y } 〉 : $ 11 , AnS increases the count of I O C A , X 1 ( C ) and I O C A , X 2 ( Z ) as C and Z are not included in this AR. Here, I O C 1 and I O C 2 refers to the respective dimensions.
Following the strategy as shown in
In multi-dimensional scenario, we use the notation O C v i ( p ) to denote the OC of ith dimension for a reported attribute v and O C v i ( p ) to denote the OC value for OOI p in that corresponding OC.
In our example, after receiving the first AR 〈 { A , B } , { X , Y } 〉 : $ 11 , the ApS creates one row for keeping the information of $11. In one column, ApS keeps the total report count T $ 11 and two other columns to keep the OC values for two dimensions, i.e. O C $ 11 1 and O C $ 11 2 . As it is the first report for 11, T $ 11 is set to one. From this report, the ApS comes to know about A, B as the OOIs for first dimension and X, Y for the second dimension. It creates columns for the OOIs in respective dimensions of OC and increases O C v 1 ( A ) , O C v 1 ( B ) , O C v 2 ( X ) and O C v 2 ( Y ) by one. The remaining de-anonymization process continues in the same manner (
MDEAS can boost up its performance by adopting some optimization techniques. In anonymization process, the IOC counts refer to the OOIs which are ruled out from the possible mappings at the end of ApS. Hence, while choosing OOIs for anonymization, we prefer the OOIs with highest IOC values. However, the OOIs which are already de-anonymized, are more preferable candidates for being selected in ARs because they are already ruled out by ApS. We can redefine the Rule 1 and Rule 2 as follows where SD denotes the set of de-anonymizable OOIs in current anonymization process.
Rule 5. (Rule of being de-anonymizable) An OOI p is de-anonymizable if
∀ q ∈ S ∧ q ≠ p I O C p ( q ) > 0 ∨ q ∈ S D (5)
The rule of anonymizing an observed report by AnS is as follows:
Rule 6. (Rule for selection of OOIs in an AR) To anonymize a report containing the attribute of OOI p, the set of selected OOIs, S ′ is formed as:
S ′ = { p ∪ any k − 1 OOIs in S D , if | S D | ≥ k − 1 p ∪ S D ∪ ( k − 1 − | S D | ) largest IOC-valued OOIs in ( S \ p ) \ S D | S D | ≤ k − 1 (6)
Similarly, while de-anonymizing reports, the OOIs which are already deanonymized for other attributes are automatically ruled out from possible candidate
lists and their OC values are not considered while de-anonymizing. The Rule 4 is modified as follows.
Rule 7. (Rule of being de-anonymized) The attribute v is de-anonymized for OOI p if
O C v ( p ) = T v ∧ ∀ q ∈ S ∧ q ≠ p O C p ( q ) < T v (7)
Here, we present a simple example referring to our example in Section 1. In our example, B is de-anonymized after receiving fifth report. However, if this optimization is applied, B would be de-anonymized at fourth report. According to Rule 2, AR would choose 〈 A , B 〉 : $ 20 instead of 〈 B , D 〉 : $ 20 . Here, A is chosen instead of D because A is already de-anonymized. In this case, the I O C B will look like the following:
In this state, B will be de-anonymized according to the Rule 5 as I O C B ( C ) , I O C B ( D ) > 0 and A is already de-anonymized for other attribute.
In this section, we present a theoretical analysis on NRRFD as explained earlier. As the order of appearance of reports (ARs) is probabilistic, we derive the expected NRRFD using probability theory. Consider a d-dimensional PSS scenario where the i-th dimension has N i OOIs that are reported with k i anonymity for all 1 ≤ i ≤ d . Overall, there are X = ∏ i = 1 d N i distinct OOI combinations that need to be reported with as many unique attributes. Our de-anonymization scheme carries out “attribute-centric” independent de-anonymization process for each of the X unique attributes.
For each AR received by ApS for a particular attribute, it eliminates N i − k i OOIs from the potential list of OOIs in the i-th dimension on the basis that their OC values are less than the number of ARs received so far. As the anonymization process selects unobserved OOIs in the order of their IOC, the de-anonymization process of ApS is able to continually eliminate N i − k i OOIs from the potential list of observed OOIs in the i-th dimension for each received report. To isolate the actual attribute, ApS needs to eliminate N i − 1 other OOIs. Therefore, the de-anonymization process in the i-th dimension requires ⌈ N i − 1 N i − k i ⌉ reports to isolate observed OOI in ith dimension from the candidate list altogether. As the de-anonymization process in each dimension is independent of other dimensions and they are carried out in parallel, the de-anonymization process of a particular attribute is completed by isolating the OOIs in all d dimensions after receiving Y = m a x 1 ≤ i ≤ d ⌈ N i − 1 N i − k i ⌉ reports.
Ideally, the anonymization framework needs no more than n i d e a l = X Y reports to de-anonymize the attributes of all X OOI combinations. This lower-bound, however, can only be met if and only if each unique attribute is observed exactly Y times, which is an unrealistic assumption. The probability of a particular attribute being observed is p = 1 X . After n observations, the number of times each unique attribute v is reported, n v , can be assumed normally distributed with mean, μ = n p = n X and variance, σ 2 = n p ( 1 − p ) = n X ( 1 − 1 X ) = n X − 1 X 2 according to the Central Limit Theorem, i.e. n v ~ N ( μ , σ 2 ) . The minimum of X number of n v ’s follows the Gumbel distribution ( [
z = Φ − 1 ( 1 − 1 X ) + γ ( Φ − 1 ( 1 − 1 X e ) − Φ − 1 ( 1 − 1 X ) ) (8)
and Φ − 1 is the inverse CDF (cumulative distribution function) of the standard normal distribution N ( 0,1 ) , and γ = 0.5772 is the Euler-Mascheroni constant [
We may now find the expected NRRFD, n ¯ needed to de-anonymize the values of all X OOI combinations by finding the root of the following quadratic equations.
n ¯ X − n ¯ z X − 1 X = Y (9)
Simplifying the equation above, we find
n = 1 2 ( z X − 1 + z 2 ( X − 1 ) + 4 X Y ) (10)
n ¯ = 1 4 ( 2 z 2 ( X − 1 ) + 4 X Y + 2 z 2 ( X − 1 ) ( z 2 ( X − 1 ) + 4 X Y ) ) (11)
Finally, by simplifying more the above equation, we get
≅ n i d e a l = ( 1 + z 2 2 Y + z Y ) (12)
is larger than N1, n i d e a l increases quadratically with both X and Y. This graph also depicts that the configurations with similar n i d e a l values require similar NRRFD.
The value of NRRFD obtained from this mathematical analysis conforms to the results obtained from our simulaitons. We present both theoretical and simulation results in Section 7.2.
In this section, we present the algorithms to be used for anonymizing observations and de-anonymizing them at ApS. These can be applied in any-dimensional PSS scenario.
Algorithm 1 is used by the AnS to anonymize a d-dimensional observation. It takes a set of user preferences ( k 1 , k 2 , k 3 , ⋯ , k d ) for d dimensions and the corresponding OOI combination r = ( p 1 , p 2 , ⋯ , p d ) as input. To remind the readers, here p 1 , p 2 , ⋯ , p d are OOIs of different dimensions such as location, product etc. The Algorithm uses corresponding IOC, i.e. I O C r to anonymize this report. For each dimension i, ( k i − 1 ) extra OOIs are selected from the set of OOIs in that dimension, i.e. S i by preferring the OOIs with highest IOC value and the de-anonymized ones following Rule 2. These selected OOIs along with the observed OOI are put into the set S ′ i . After preparing the set S ′ i , the Algorithm updates I O C r by incrementing the IOC count for each OOI q | q ∈ S i ∧ q ∉ S ′ i . Thus, the returning set is formulated, i.e. S ′ = { S ′ 1 , S ′ 2 , ⋯ , S ′ d } where S i ′ denotes the anonymized OOI set for ith dimension.
Algorithm 2 is used by the ApS for de-anonymizing the ARs. Here, input S ′ i and v denote the set of anonymized OOIs in the ith dimension and the reported
Algorithm 1. { S ′ , v } : Anonymize ( { p 1 , ⋯ , p d } , v , { k 1 , ⋯ , k d } ) .
Algorithm 2. P: De-anonymize ( S 1 ′ , S 2 ′ , ⋯ , S d ′ , v ) .
attribute, respectively. For each dimension i, the Algorithm increases the OC value for all OOIs q, i.e. O C v i ( q ) | q ∈ S ′ i . To check whether a reported attribute, v has been de-anonymized, ApS checks Rule 1 for each dimension i. If all dimensions’ observed OOIs are de-anonymized following Rule 1, then the actual OOI combination for v is known by the ApS.
Note that, as discussed in Section 3, the anonymization Algorithm 1 can be optimized for single dimension to achieve faster de-anonymization. In order to do this, we need to keep track of the already de-anonymized OOIs and prioritize those to add in anonymized set in Line 4 of Algorithm 1. However, this optimization is not applicable in multi-dimensional scenario as the attribute of OOI depends on multiple dimensions and the anonymization is done separately for each dimension.
To establish the applicability and assess the performance of our proposed schemes, we have experimented with both comprehensive simulation and android-based real world prototype. Adversary capabilities are modeled considering realistic approach to encourage the replication; we have shared our implementations of both simulation and android-based prototype1.
We have conducted a simulation of our proposed schemes using custom simulator. User Observations are generated randomly with uniform distribution. By varying the number of OOIs and the anonymity preference of users, we have analyzed the performance of the algorithms for both single and multi-dimensional scenarios. As we are mostly interested in evaluating the performance of proposed schemes in terms of data quality, we investigated how many observations are required to achieve different extent of de-anonymization. We use a term called “De-anonymization rate” to present our results. De-anonymization rate of T observations is defined as the proportion of OOIs de-anonymized among the N OOIs. We shall also analyze the impact of anonymity preference on de-anonymization rate. All the results presented here are obtained by averaging 1000 simulation runs.
In this section, we have presented the results of simulation for single-dimensional PSS scenario by applying the simple optimization explained in Section 4. As alluded earlier, our proposed scheme is scalable in the number of OOIs. Hence, we could experiment with PSS scenarios with a reasonably large number of OOIs.
more than 200 reports are needed for k = 13 . However, the highest possible anonymity preference, e.g. k = 14 requires considerably higher number of observations, i.e. 375. This result indicates that based on the observation frequency in the OOIs, a feasible k should be selected.
As we have discussed in Section 1 that we are interested in scenarios where individual data need to be retrieved at the destination, existing techniques such as spatial cloaking, obfuscation are not applicable in our context. Hence, k-Anonymization Techniques, e.g. PGAS [
Allowing anonymity in multiple dimensions and at once satisfying different anonymity preference for each dimension is the most desired performance for an anonymization scheme. We achieved this without sacrificing recoverability of data.
3000 to de-anonymize all OOIs with highest anonymity k i = N i − 1 . However, when the anonymity preference is reduced to half, i.e. k i = N i / 2 , the required number of observations also declines significantly. For example, for N l = 8 and N p = 4 , if k l and k p is reduced from k i = N i − 1 to k i = N i / 2 , the required number of observations decreases by 49% which is approximately half compared to the highest anonymity. Hence, in the case of a very large number of OOIs in multi-dimensional scenraio, PSS can vary the anonymity preference in different dimensions in order to achieve good de-anonymization with a finite number of observations.
In the real world, individual’s privacy concern varies with many parameters such
as the culture of the society and family, job position, age, etc. Therefore, choosing a universal anonymity preference (k) for all users is sometimes impractical. Moreover, incentive schemes may reward lower anonymity preference more if it helps to gain better de-anonymization. From this consideration, we would like to evaluate the response of our proposed scheme against variable anonymity preference. Without loss of generality, we show result for three different configurations in
Adversary residing near ApS can eavesdrop the ARs sent by the participants and thus reveal actual attribute of OOIs and find the users’ association with the OOI. We have discussed the adversary model in details in Section 2.2 and a grouping strategy is proposed to mitigate the adversary risk. We also discussed some additional adversary capabilities. In our simulation, we have evaluated MDEAS’s performance under the presence of adversaries with additional capabilities where grouping strategy is applied. We use
We simulated all 3 types of adversaries as defined in
collusion i.e. the size of their colluding team,
We have compared the de-anonymization capability of different types of adversaries in
The colluding adversaries incur more privacy risk on PSS as more collusion means more shared information and revelation of ARs.
depicts that G does not have much impact on the de-anonymization capability of the adversary.
From Figures 14-16, we may conclude that the interception of ARs by adversary does not help them in their de-anonymization. However, more collusion means more revelation of attributes and consequently the risk increases. Still, this privacy risk is not significant as such large portion of a colluding group of users practically does not exist.
As discussed in Section 2.2, the adversary might make some prediction on their set of most probable de-anonymized OOIs. As an example, we explained how the adversary can predict location OOIs using distance estimation. We have simulated such adversary with location prediction capability and shown the result in
We have developed an Android-based software prototype as a proof of concept of our proposed scheme which can be applied in real world scenario. Specifically, it has modules for the users to send the actual report to the AS, receive ARs from AS and forward this with user id to the ApS.
With the help of this application using Android Smart-phones (connected to the Internet and equipped with GPS) and two separate servers dedicated as AnS and ApS built with Python Tornado Web Framework, we test our anonymization and de-anonymization algorithms. Here the user’s current location is obtained from device’s GPS and other information like the product and its actual
attribute is taken as user input. After receiving the anonymized report from AnS, a user can directly send the report to ApS shown in
Assurance of privacy in accordance with users’ contribution is the key factor for maintaining adequate participants in PSS system [
Hot-Potato-Privacy-Protection (HP3) [
in volunteer peers. Ensuring the trustworthiness of the peers is a big challenge of these schemes.
The pseudonym-based approaches are also common for protecting identity privacy from ApS. But long-term pseudonym tends to be identified easily by adversary. Mix-zone concept is proposed in [
Encryption is one of the most common approaches for protecting privacy in PSS. E. De Cristofaro et al. presented an approach where server gets encrypted data and blindly performs computation on the encrypted data. LotS [
Multi-secret sharing [
Different techniques for privacy protection have been proposed in the PSS scenarios where the ApS is only interested in aggregated result. PriSense [
Obfuscation is first introduced in [
Differential privacy (DP) protection is a new paradigm based on the notion that some aggregate property of a large data-set remains unchanged even if individual data are tweaked with controlled random noise. Many researchers have utilized this differential privacy technique for providing privacy protection in mobile crowd-sourcing (task assignments), aggregation-based queries and location-based services. For example, DP-MDBScan schema proposed in [
Many privacy-preserving mechanisms [
In this paper, we have presented efficient algorithms for anonymization and de-anonymization of user observations in the context of PSS. To the best of our knowledge, this is the first work that presented anonymization technique in multiple dimensions with flexible anonymity preference. Theoretical analysis and simulation results show that our proposed scheme achieves sufficient data recoverability at the target end from a feasible number of user reports. We have also implemented an Android prototype and conducted experiments in real-world. Our proposed approach is likely to contribute to making participatory sensing a popular technology to the community ensuring privacy of participants without compromising the quality of data.
The authors declare no conflicts of interest regarding the publication of this paper.
Abrar, N., Zaman, S., Iqbal, A. and Murshed, M. (2020) Multi-Dimensional Anonymization for Participatory Sensing Systems. Int. J. Communications, Network and System Sciences, 13, 73-103. https://doi.org/10.4236/ijcns.2020.136006