Simple probability question (volume vs. probability)

Discussion in 'Scientific Statistics Math' started by golabidoon, Dec 28, 2007.

  1. golabidoon

    golabidoon Guest

    Hello all,

    Assume that unit cube in R^3 is randomly and uniformly sampled by N
    points. Let's call this set of points S(N). Now consider a new sample
    point X drawn from the same distribution. Denote the nearest element
    of S(N) to X by Y. I would like to find the probability that the
    distance between X and Y is below some constant D. The constants N and
    D are given, i.e. the ultimate probability should hold for any S and X
    and must only depend on N and D.

    Your help would be appreciated

    G.D.
     
    golabidoon, Dec 28, 2007
    #1
    1. Advertisements

  2. wrote in @c4g2000hsg.googlegroups.com:
    Musings of a former physics major: I am not sure that it is simple. My
    subject line would have been "Distance to nearest neighbor in 3-space".
    My first hunch was a Rayleigh distribution, but that distribution is
    derived from distances to neighbors in a Gaussian distribution.

    I found treatments of the one dimensional analogue of this problem in a
    text on order statistics where the term given to the distances to the
    next order statistic was "spacings". I would think, perhaps naively, that
    an arbitrary additional point would have at most 1/2 of a "spacing
    instance" to the nearest neighbor, and also perhaps naively, that the
    distribution within that 1/2-distance would be uniform. (The 1-D spacings
    are distributed as Poisson. So I my probability guesswork says the
    nearest neighbor distance would also be Poisson, but I have not done the
    math.)

    I would have expected to find a worked solution in studies of gas
    dynamics, electron physics, interstellar medium, or perhaps even in
    physical systems where electrons are bumping into crystal defects ... but
    I failed to find a reference that gave an immediate answer. Search terms
    used without success in various combinations included "3 dimensional
    uniform", "nearest neighbor", "spacings". You might try posting to one of
    the theoretical physics newsgroups and see if my hunch, that this has
    been solved in that domain, pans out.
     
    David Winsemius, Dec 31, 2007
    #2
    1. Advertisements

  3. A couple of ideas. (1) Work with the square of the distance to
    avoid a square root which would probably make life more difficult.
    (2) The cumulative distribution function of the minimum of
    a set of independent, identically distributed variables
    {d(X, Y)^2 : Y in S(N)} is just (1 - F(u))^N where F is the cdf of
    any one of them. (3) The pdf of d(X, Y)^2 is going to be
    a convolution pdf(d1^2) * pdf(d2^2) * pdf(d3^2)
    where d1 = x1 - y1, d2 = x2 - y2, d3 = x3 - y3.

    Maybe you've already figured out that much. As suggested
    elsewhere in this thread, it seems this problem must have been
    solved already in some context ....

    Hope this is useful in some way.

    Robert Dodier
     
    Robert Dodier, Dec 31, 2007
    #3
  4. golabidoon

    Herman Rubin Guest

    It is easier to look upon it the other way.

    The distribution of X is uniform. For any position
    x of X, let h(x,d) denote the volume of the intersection
    of the sphere of radius d around x with the unit cube,
    which is of volume 1. Then the probability that the
    distance is greater than D is

    int (1-h(x,D))^N dx,

    integrating over the cube.
     
    Herman Rubin, Dec 31, 2007
    #4
  5. Oops, that should be 1 - (1 - F(u))^N.
    I've spent some time noodling on this and made a little progress.
    It appears the pdf of d(X, Y)^2 is a function defined piecewise,
    with the piece for d^2 in [0, 1] being
    (-3*x^2+16*x^(3/2)-6*%pi*x)/6+(sqrt(x)*(4*x+6*%pi)-6*%pi*x)/3
    (there being a piece for [1, 2] and one for [2, 3] also, but so far
    I am too lazy to work them out), as determined by computing
    the convolution directly from the definition (f conv g)(x) =
    integral(f(t) * g(x - t), t, -inf, inf),
    with pdf(d1^2) being f(x) = 1/sqrt(x) - 1 (and g = f in this case).

    The piece of the cdf corresponding to the pdf above appears to be
    Pr(d1^2 < D) = (-5*D^3+sqrt(D)*(48*D^2+40*%pi*D)-45*%pi*D^2)/30
    for D in [0, 1]. Incidentally I've computed the integrals with the
    computer program Maxima; %pi is Maxima's symbol for 3.14....

    Completing the solution is left as an exercise for the reader.
    Dunno if I'll get around to it.

    HTH

    Robert Dodier
     
    Robert Dodier, Jan 1, 2008
    #5
  6. I confirm your result for the cdf. Of course your D is the square of
    the original poster's D. I used Mathematica to work on this, and have
    arrived at a cdf for all three pieces. I'll use "d" for the original
    poster's variable.

    For d in [0,1]:

    (8*d^5)/5 - d^6/6 + (4*d^3*Pi)/3 - (3*d^4*Pi)/2

    For d in [1,Sqrt[2]]:

    1/10 - d^2/2 + (3*d^4)/2 + d^6/3 -
    (2*Sqrt[-1 + d^2]*(-2 + 9*d^2 + 8*d^4))/5 - Pi/2 + 3*d^2*Pi -
    (8*d^3*Pi)/3 + 6*d^4*ArcSec[d]

    For d in (Sqrt[2],Sqrt[3]]:

    9/10 - (5*d^2)/2 - (3*d^4)/2 - d^6/6 +
    (2*Sqrt[-2 + d^2]*(1 + 9*d^2 + 4*d^4))/5 - (17*Pi)/2 + 3*d^2*Pi +
    (3*d^4*Pi)/2 - (16*d^3*ArcTan[Sqrt[d^2/(-2 + d^2)]])/3 +
    8*ArcTan[(2 - d)/Sqrt[-2 + d^2]] + 8*ArcTan[(2 + d)/Sqrt[-2 + d^2]] -
    6*(-3 + 2*d^2 + d^4)*ArcTan[Sqrt[-2 + d^2]] +
    (16*d^3*ArcTan[Sqrt[d^2*(-2 + d^2)]])/3

    This last expression is undefined for d==Sqrt[2], but it has the
    correct limiting value. The "ArcTan"s in the expression give some
    hope for simplification, but they didn't seem to collapse as well as
    the ones that had existed in the middle expression prior to
    simplification.

    Scott
     
    Scott Hemphill, Jan 3, 2008
    #6
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.