• Ingen resultater fundet

The distribution of communication cost for a mobile service scenario

N/A
N/A
Info
Hent
Protected

Academic year: 2022

Del "The distribution of communication cost for a mobile service scenario"

Copied!
10
0
0

Indlæser.... (se fuldtekst nu)

Hele teksten

(1)

The distribution of communication cost for a mobile service scenario

June 13, 2009

Jesper Møller, Department of Mathematical Sciences, Aalborg University, Fredrik Bajers Vej 7G, DK-9220 Aalborg Ø, Denmark.

Email: jm@math.aau.dk.

Man Lung Yiu, Department of Computer Science, Aalborg University, Selma Lagerl¨ofs Vej 300, DK-9220 Aalborg, Denmark.

Email: mly@cs.aau.dk.

Abstract: We consider the following mathematical model for a mobile ser- vice scenario. Consider a planar stationary Poisson process, with its points radially ordered with respect to the origin (the anchor); these points may correspond to locations of e.g. restaurants. A user, with a location different from the origin, asks for the location of the first Poisson point and keeps asking for the location of the next Poisson point until the first time that he can be completely certain that he knows which Poisson point is his nearest neighbour. The distribution of this waiting time, called the communication cost, is analysed in detail. In particular the expected communication cost and the asymptotic behaviour as the distance between the user and the anchor increases are of interest.

Keywords: nearest-neighbour search; Poisson process; radial simulation al- gorithm; waiting time.

1 Introduction

The mobile Internet offers services that e.g. receive the location of the nearest point of interest such as a store, restaurant, or tourist attraction, see Yiu et al. (2008, 2009) and the references therein. Tools from applied probability

(2)

and in particular stochastic geometry should be useful for an analysis of the performance of such services.

We illustrate this by considering a user located at a given point q ∈ R2 and data points p1, p2, . . . ∈R2 corresponding to the locations of e.g. stores.

In order to preserve some privacy, the user queries a server for nearby points but he reports not his correct locationqbut another locationq0 ∈R2 referred to as the anchor. An incremental query processing on the server is used so that the data points are ordered in increasing distance to the anchor. The user then stops to queries the server as soon as possible, i.e., when the nearest data point with respect to qcan be determined. The waiting time for this to happen is called the communication cost and is denoted M; further details are given in Section 2.

The purpose of this note is to analyse the distribution of M assuming that the data points follow a stationary Poisson process Φ = {p1, p2, . . .}

with intensity ρ > 0 (see, e.g., Kingman, 1993, or Chapter 3 in Møller and Waagepetersen, 2004). In particular the expected communication cost and the asymptotic behaviour as the distance between q and q0 increases are of interest. Section 3 discusses our results.

2 The communication cost

We use the following notation. By stationarity and isotropy of Φ, we can without loss of generality take q0 = (0,0) and q = (l,0), where l > 0 is the distance between the anchor and the user location. Let (Ri, θi) ∈ [0,∞)× [0,2π) be the polar coordinates ofpi with respect to the anchor (the origin), and order the data points such that the Ri are increasing, R0 := 0 ≤ R1 ≤ R2 ≤ . . .. Let p = pi if pi be the first NN (nearest-neighbour) to q among p1, p2, . . .. For i = 1,2, . . ., let qi denote the first NN to q among the i first data points p1, . . . , pi, and let ˜Ri =kq−qikdenote the distance from qtoqi. Moreover, let B(x, r) = {y∈ R2 :ky−xk ≤r} denote the closed disc with centre x∈R2 and radius r ≥0.

Then the communication cost M is defined as the first time m such that the user can be completely ensured that p = qm when he has only received p1, . . . , pm from the server, i.e., no matter where the points pm+1, pm+2, . . . potentially could be located in R2\B(q0, Rm). In other words, if for m∈N, we define the demand space Dm and the supply space Sm by

Dm = disc(q,R˜m), Sm = disc(q0, Rm), then

M = inf{m∈N:Dm ⊆ Sm} (1)

(3)

(setting inf∅=∞). Furthermore,

p=qM (2)

is returned as the NN to q, provided of course M <∞. See Figure 1.

p

1

p

2

p

3

q’ q

Figure 1: An example withM = 3, showing the corresponding demand space D3 and supply space S3.

3 Results

Since M = 1 implies that p1 lies on the halfline H with endpoint q and directionq−q0, and this happens with probability zero, we have almost surely that M ≥2 andRM ≥l. Similarly, with probability one, for m∈ {2,3, . . .}, M = m implies that pm cannot be the NN to q among p1, . . . , pm, because this would imply that pm ∈H. Moreover, with probability one, the sequence R1, R2, . . . is strictly increasing to infinity, and so the sequence of supply spaces S1 ⊂ S2 ⊂ . . . tends to R2. On the other hand, the sequence of demand spaces decreases. Consequently, by (1) and (2), with probability one, 2≤M < ∞, D1 ⊇ D2 ⊇. . .⊇ DM, and

p=qM =qM−1, R˜M = ˜RM−1, RM ≥l. (3) The distribution of M may be estimated by Monte Carlo methods us- ing a radial simulation algorithm due to Quine and Watson (1984). The radial simulation algorithm simply utilizes the fact that the squared radial coordinates R12, R22, . . . form a homogeneous Poisson process with intensity πρ on the positive halfline, so R2i −R2i−1, l = 1,2, . . ., are independent and exponentially distributed with mean 1/(πρ), and they are independent of the

(4)

angular coordinates θ1, θ2, . . ., which in turn are independent and uniformly distributed on [0,2π). The following Lemma 1 follows straightforwardly from these properties. In Lemma 1, when we consider the first m data points and ignore their radial ordering, we write{x1, . . . , xm}for the corresponding point configuration. Moreover, Unif([0,2π)) denotes the uniform distribution on [0,2π), and Γ(α, β) the gamma distribution with shape parameter α and in- verse scale parameter β.

Lemma 1 For anym ∈N, we have the following.

(i) R2m ∼Γ(m, πρ) is independent of θm ∼Unif([0,2π)).

(ii) Conditional on Rm+1 = r, the configuration of the first m points {x1, . . . , xm} is a binomial point process on B(q0, r) (i.e., the xi are independent and uniformly distributed on B(q0, r)).

(iii) Conditional on Rm+1 =r, we have that θm ∼Unif([0,2π)) is indepen- dent of Rm, where Rm/r ∼ B(1, m−1). If m ≥ 2 and we condition on bothRm+1 =r and (Rm, θm) with Rm =t, then{x1, . . . , xm−1}is a binomial point process on B(q0, t).

We now turn to exact results for the distribution of the communication cost M, where its distribution function turns out to be more tractable than its probability density function, and we can obtain an expression for the expectation EM. We need the following notation. Let

fm+1(r) = 2(πρ)m+1

m! r2m+1exp −πρr2

, r >0, (4)

denote the density function of Rm+1, cf. (i) in Lemma 1. For r ≥ l and 0≤ϕ <2π, let

v(ϕ, r) = r2 −l2sin2ϕ1/2

−lcosϕ. (5) Finally, for 0≤r−l ≤s≤v(ϕ, r), define

g(r, s) = |B(q0, r)\B(q, s)|

|B(q0, r)| = 1−h(r, s)/ πr2

(6) where

h(r, s) =|B(q0, r)∩B(q, s)| (7)

=r2arccos

l2+r2 −s2 2lr

− l2+r2−s2 4l2

h

4l2r2− l2+r2−s22i1/2

+s2arccos

l2+s2−r2 2ls

−l2+s2−r2 4l2

h

4l2s2− l2+s2−r22i1/2

.

(5)

Here | · | denotes area (Lebesgue measure), and we suppress in the notation that the functions in (5)-(7) depend on l >0.

Proposition 1 For m= 2,3, . . ., we have that P(M > m) =

Z l

0

fm+1(r) dr+

m Z

l

Z

0

Z v(ϕ,r)

r−l

s

πr2g(r, s)m−1fm+1(r) dsdϕdr (8) and

EM = 2 +πρl2− Z l

0

1 +πρr2

exp −πρr2

dr+ (9)

Z

l

Z

0

Z v(ϕ,r)

r−l

2πρ2sr

exp(−ρh(r, s))−exp −πρr2

dsdϕdr is finite.

ProofUsing (1) and (3), considering each of the casesRm+1 < landRm+1 ≥l, we see that the probability that M > m is equal to the sum of two terms, viz the term

P(Rm ≤l) = Z l

0

fm+1(r) dr and the term

Z

l m

X

i=1

P(xi is the NN toq amongx1, . . . , xm and

B(q,kxi−qk)6⊂B(q0, r)|Rm+1 =r)fm+1(r) dr.

By (ii) in Lemma 1,

P(xi is the NN toq amongx1, . . . , xm and (10) B(q,kxi−qk)6⊂B(q0, r)|Rm+1 =r)

= Z

B(q0,r)\B(q,r−l)

|B(q0, r)\B(q,kx−qk)|

|B(q0, r)|

m−1

1

|B(q0, r)|dx. (11) Hence, by making a shift of coordinates from

x=q+s(cosϕ,sinϕ)

(6)

to the polar coordinates (s, ϕ), we obtain (8) after a straightforward calcula- tion, noticing the following facts: in (11) we can set q= (l,0) andq0 = (0,0);

for fixed r > l and ϕ ∈ [0,2π), since x ∈ B(q0, r)\B(q, r−l), we obtain that s ranges from r−l to v(ϕ, r); here the latter bound follows from the equation k(l,0) +s(cosϕ,sinϕ)k = r, which has only one solution because s >0. See also Figure 2.

l q'

xi

q r

s

ϕ

Figure 2: Illustration of the notation used in the proof of Proposition 1. The functionv(ϕ, r) is the upper bound onsconsidering a case with idata points such that xi =qi is the nearest-neighbour to q, B(q0, r) is the corresponding supply space, and B(q, s) is the corresponding demand space.

Since M − 2 is a non-negative discrete random variable, we have that EM = 2 +P

m=2P(M > m). Combining this with (8), we obtain EM = 2 +

Z l

0

X

m=2

fm+1(r) dr +

Z

l

Z

0

Z k(ϕ,r)

r−l

s πr2

X

m=2

mg(r, s)m−1fm+1(r) dsdϕdr.

From this and (4), we easily obtain (9).

Since k(ϕ, r) ≤ r +l, it follows from (9) that EM is finite if for some number r0 > l,

I :=

Z

r0

Z r+l

r−l

sr

exp(−ρh(r, s))−exp −πρr2 dsdr

is finite. This is indeed the case, since for any >0, ifr0 > l+is sufficiently large, thenh(r, s)≥π(r−l−)2 whenever r≥r0 and r−l≤s ≤r+l, and so

0≤I ≤2l Z

r0

r2exp −πρ(r−l−)2

dr−2l Z

r0

r2exp −πρr2 dr

(7)

where both integrals are finite.

The integrals in (8) and (9) can be evaluated by numerical methods.

In (9), as l increases towards infinity, EM is dominated by the term πρl2, meaning that the dominating part of the expected communication cost is proportional to ρ and l2. It is worth noticing that πρl2 is the expected number of points in the disc b(q0, l).

Note that the distribution of M depends only on (ρ, l) through √ ρl.

Figure 3 visualizes the difference between EM and its dominant term πρl2 (shown as ‘O’), where for different values of l and withρ fixed to 1, EM has been estimated both by using (9) and a numerical method (shown as ‘×’) and by the Monte Carlo method based on 1000 independent simulations obtained by the radial simulation algorithm (shown as ‘5’). The axes in Figure 3 are shown in the log-scale, the estimates of EM obtained by the two methods are rather close, and the dominating term of EM is seen to converge to EM as l → ∞. The figure also shows the Monte Carlo estimates of the 5% and 95% quantiles (shown as ‘+’).

Figures 4 and 5 show further Monte Carlo estimates with ρ = 1 and based on 1000 independent simulations obtained by the radial simulation algorithm for each tested value of l. Figure 4 suggests an approximate log- linear relation between √

VarM /EM and l. Considering values of l≥2, the regression line in Figure 4 was estimated by the method of least squares to y = 0.5524x−0.9147, where (x, y) corresponds to (log10l,log10

VarM /EM) (log10 denotes the logarithm function with base 10). Figure 5 shows four histograms of the distribution of M corresponding to the simulations with l = 0.1,1,10,100. For the larger values of l, the distribution of M seems to be well approximated by a Gaussian distribution.

Acknowledgements:

Supported by the Danish Natural Science Research Council, grant 272-06- 0442, ”Point process modelling and statistical inference”.

References

[1] J. F. C. Kingman. Poisson Processes. Clarendon Press, Oxford, 1993.

[2] J. Møller and R. P. Waagepetersen. Statistical Inference and Simulation for Spatial Point Processes. Chapman and Hall/CRC, Boca Raton, 2004.

(8)

[3] M. P. Quine and D. F. Watson. Radial simulation of n-dimensional Pois- son processes. Journal of Applied Probability, 21:548–557, 1984.

[4] M. L. Yiu, C. S. Jensen, X. Huang, and H. Lu. SpaceTwist: Manag- ing the trade-offs among location privacy, query performance, and query accuracy in mobile services. In M. Castellanos, A. P. Buchmann, and K. Ramamritham, editors, Proceedings of the 24th IEEE International Conference on Data Engineering, pages 366–375. IEEE Computer Soci- ety, Los Alamitos, CA, 2008.

[5] M. L. Yiu, C. S. Jensen, J. Møller, and H. Lu. Design and analysis of an incremental approach to location privacy for location-based services.

Submitted for publication, 2009.

0.01 0.1 1 10 100 1000 10000 100000

0.1 1 10 100

Number of received points

l = dist(q,q')

5% and 95% quantiles of simulated cost Average of simulated cost Numeric value of E(M) Dominating term of E(M)

Figure 3: Estimates of EM vs. l when ρ = 1 and EM is estimated by the result in Proposition 1 using a numerical method (shown as ‘×’) and by the Monte Carlo method (shown as ‘5’), together with the dominant term πρl2 (shown as ‘O’) and Monte Carlo estimates of the 5% and 95% quantiles (shown as ‘+’).

(9)

0.001 0.01 0.1 1

0.1 1 10 100

l

sqrt(VarM)/EM

Figure 4: Monte Carlo estimates of√

VarM /EM vs. l when ρ= 1.

(10)

0 100 200 300 400 500 600 700 800

2-2 3-3 4-4 5-5

l=0.1

0 20 40 60 80 100 120 140 160

2-2 4-4

6-6 8-8

10-1 0

12-1 2

14-1 4

16-1 6

18-1 8

20-2 0

l=1

0 20 40 60 80 100 120 140 160

261-268 278-286

296-304 313-321

331-339 349-356

366-374 384-392

401-409 419-427

l=10

0 20 40 60 80 100 120 140

31029-31104 31182-31258

31336-31412 31489-31565

31643-31719 31797-31872

31950-32026 32104-32180

32257-32333 32411-32487

l=100

Figure 5: Histograms of the distribution of M corresponding to 1000 inde- pendent simulations with ρ= 1 andl = 0.1,1,10,100.

Referencer

RELATEREDE DOKUMENTER

Freedom in commons brings ruin to all.” In terms of National Parks – an example with much in common with museums – Hardin diagnoses that being ‘open to all, without limits’

At different set pressures and times, the flow rate and velocity values of the pumps have been calculated where the pressure head of the pump changes for sudden closing and opening

fluidities of borders and related phenomena of migration, globalization and interdependence; with the fact that the term has increasingly been adopted by ethnic/minority groups

1) The stop on sales of petrol and diesel cars proposed by the government in its Climate and Air Policy Proposal has not been included in the best guess scenario but is covered by

If Internet technology is to become a counterpart to the VANS-based health- care data network, it is primarily neces- sary for it to be possible to pass on the structured EDI

Biomass estimation using large footprint size LiDAR has been applied in different stages such as tropical forest where determination values for the model of 0.94 with and Root

The ANEW [1] list has been extended with an algorithm using Latent Semantic Analysis [2] to compute term to term similar- ities which were then used to derive aective estimations

Value Delivery Architecture: Paym, as a mobile payment service offered by the UK banking consortium, has, on its value delivery architecture, direct access to