[ad_1]

### Contribution of each user to the dataset

The guarantees given by Bassolas et al. are based on the assumption that each user does not contribute more than one trip to the dataset. Analyzes in the paper strongly suggest that users contribute more than one trip to the dataset.

Indeed, the final dataset used by Bassolas et al. contains connections, numbers of origin-destination trips, with at least 100 trips each. The dataset contains for example 46,333 connections for Atlanta (population 5 M). Assuming each user reports exactly one trip, at least 4.6 million people in Atlanta would have to contribute to the dataset to get the number of connections reported in the dataset. Since only 67% of mobile phone users in the United States use Google Maps as their primary navigation application^{seven}, we find this unlikely, which strongly suggests that some users have contributed more than one trip to the dataset.

Regarding unique trips, the authors later confirmed that each user provides a list of their unique weekly trips to the weekly aggregate. If the same trip (A â†’ B) is made several times in a week, it is only counted once. In the case of one of the authors who made 39 trips in a given week, this means that he contributes 32 unique trips to the weekly aggregate while 7 of them would be discarded. Note that here unique refers to trips that are unique for a given user during a given week.

### Generate trips from empirical data

We use a set of longitudinal mobility data extracted from CDR data. Each individual path contains points with time and approximate location (antennas). We segment the trajectories using a win-win approach, selecting the most used location for each hour, and define a trip as movement from one location to another during the consecutive hour.

### Execute the attack

We follow the procedure described by Bassolas et al. to aggregate the anonymized journeys: calculation of the origin-destination counting matrix for single journeys, add zero average Laplacian noise at scale 1 /*??* at each entry, and delete all (noisy) counts less than 100.

We then use the attack model the authors rely on to calculate the 16% increase on a random estimate: the standard membership inference attack with perfect knowledge. In this model, the powerful attacker has access to all records in the dataset, except the victim, and ancillary information about the victim.

More precisely, for *k* between 0 and 70, we select a user *you* with exactly *k* trips. The attacker performs a membership attack to test whether the anonymized data *D*they received is *D*^{+} (anonymized trajectories with *you* included) or *D*^{–} (without *you* included). We calculate the local origin-destination matrix *A*(*you*) for the user *you* and, by linearity of the noise addition, calculate the normalized matrix *A*(*D*) *– A*(*D*^{–}) generated from no user or *you*. We perform a likelihood ratio test to distinguish whether the normalized matrix was sampled from a Laplacian distribution *L*(0, 1 /*)* Where *L*(*A*(*you*), 1 /*??*).

We repeat this procedure 10,000 times for all values â€‹â€‹of *k* between 0 and 70 and report the average in Fig. 1.

### Theoretical limits

The theoretical bound reported by Bassolas et al. is achieved by limiting the posterior probability of an attacker trying to infer whether a user is in the dataset, *??*(*Yes*). Formally, let *D** be the tested dataset, *D*^{+} dataset with user *you*, and *D*^{–} data without *you*. If the attacker’s predecessor does not have any information (for example, when *P*[*D**â€‰=â€‰*D*^{+}]= 0.5), we then have for all *Yes* (and for *M* an Îµ-DP mechanism^{8}):

$$ frac { pi (y)} {1- pi (y)} = frac {P[{D}^{ast }={D}^{+}|M({D}^{ast })=y]} {P[{D}^{ast }={D}^{-}|M({D}^{ast })=y]} = frac {P[M({D}^{+})=y]} {P[M({D}^{-})=y]} the {e} ^ { varepsilon} $$

which then implies *??*(*Yes*)*??*â€‰*e*^{??}/ (1 + *e*^{??}).

### Conservative estimate of loss of confidentiality

To estimate the loss of privacy for any user in 1 week of data, we assume conservative limits: each user contributes only once to each count and does not make more than 70 unique trips per week (10 per day). Let *m*_{trips} be the maximum number of unique trips that any user could contribute to the data, then the L_{1} the sensitivity of the counting matrix is*m*_{ttorn}. Add a tour (1 /*??*) noise and low number filtering involves ( *m*_{trips}Ã— *??* , 2.1 Ã— 10^{âˆ’29}) – differential confidentiality by direct application of simple dialing limits^{9}.

Likewise, the privacy loss for a year of data can be estimated as the sum of the privacy losses for each week. A reasonable estimate of the total loss of data publication would thus be 52 times the loss of confidentiality for a week, *??*_{total}= 52 Ã— *m*_{trips}Ã—*??* = 2402.4.

Note that although better limits can be obtained, they require higher values â€‹â€‹of*??*^{8.9}. In this specific case, the acceptable values â€‹â€‹of*??*would require prohibitive values â€‹â€‹of*??*rendering guarantees meaningless in practice.

[ad_2]