Very fast floating point PRNG

Discussion in 'Scientific Statistics Math' started by Ribeiro Alvo, Aug 25, 2009.

  1. Ribeiro Alvo

    Ribeiro Alvo Guest

    Ribeiro Alvo, Aug 25, 2009
    #1
    1. Advertisements

  2. Ribeiro Alvo

    Herman Rubin Guest

    It is undoubtedly fast, but what are its independence
    properties? Also, as computer words are of finite
    length, it must be periodic in practice, but agreed
    with a long period.
     
    Herman Rubin, Aug 26, 2009
    #2
    1. Advertisements

  3. Ribeiro Alvo

    Phoenix Guest

    Rubin, what do you mean with independence properties?

    About the period, I can say its a non periodic generator.

    Periodicity
    Because (n) is an integer growing, there are non chances to repeat
    cycles. In case of (X)=0 or 1, the next (X) will be equal to the (s).
    And the another next (X) it will be different, because (n) increase.
    Here an example to explain that. Take two different small sets from
    the whole sequence with one repeated number on each set, as follows:

    Xn= 0.234546748574634 Xn+1 = 0.783564763234561 Xn+2=
    0.674863564797320 …
    Xm= 0.356464763212724 Xm+1= 0.783564763234561 Xm+2= 0.098743643223776…
     
    Phoenix, Aug 26, 2009
    #3
  4. Ribeiro Alvo

    Vermute Guest

    Rubin, what do you mean with independence properties?
    About the period, I can say its a non periodic generator.

    Periodicity
    Because (n) is an integer growing, there are non chances to repeat cycles. In case of (x)=0 or 1, the next (x) will be equal to the (s). And the another next (x) it will be different, because (n) increase. Here an example to explain that. Take two different small sets from the whole sequence with one repeated number on each set, as follows:

    Xn = 0.234546748574634
    Xn+1 = 0.783564763234561
    Xn+2 = 0.674863564797320

    Xm = 0.356464763212724
    Xm+1 = 0.783564763234561
    Xm+2 = 0.098743643223776
     
    Vermute, Aug 26, 2009
    #4
  5. Ribeiro Alvo

    root Guest

    If these are subsequences of two real series generated by
    your algorithm, then I have a problem. You have 15 digit
    numbers, so the chance of getting two identical numbers
    in N trials is (1-Exp(-15))**N, which is e to the power of
    N*exp(-15). So for a 50% chance of a sequence having two
    identical numbers you would have had to examine something
    like 10**15 numbers. Since you seem to have found two
    identical numbers there is an evident non-randomness.


    Am I missing something?
     
    root, Aug 26, 2009
    #5
  6. Ribeiro Alvo

    Vermute Guest

    Root

    'Since you seem to have found two identical numbers there is an evident non-randomness.'

    See:

    If I have a coin with 0 and 1 and I flip it, I have, for example:

    00101110110110010001.....

    The zeros and ones, are repeated, however there are realy randomness.

    Repeated numbers in random series do not mean non-randomness.
     
    Vermute, Aug 26, 2009
    #6
  7. Ribeiro Alvo

    Ralf Guest

    Hello,
    in the Basic-Code it is unclear what the initial value of x is, isn't it?
    Can you please post a C source of it?
    Thx
     
    Ralf, Aug 26, 2009
    #7
  8. Ribeiro Alvo

    Vermute Guest

    Hello Ralf

    In BASIC code (x) starts with zero, and no need to be initialized.

    About C Source I send you an example to compare speed.

    you can adapt it to your porposes.

    ----------------------------
    #include <math.h>
    #include <stdio.h>
    #include <sys/time.h>

    #define LOOPS 1000000000

    static unsigned long long
    x=1234567890987654321ULL,c=123456123456123456ULL,
    y=362436362436362436ULL,z=1066149217761810ULL,t;

    #define MWC (t=(x<<58)+c, c=(x>>6), x+=t, c+=(x<t), x)
    #define XSH ( y^=(y<<13), y^=(y>>17), y^=(y<<43) )
    #define CNG ( z=6906969069LL*z+1234567 )
    #define KISS (MWC+XSH+CNG)

    static double alvo_s = 0.17890123456;
    static double alvo_x = 0;

    unsigned int timer (void)
    {
    struct timeval tv;
    gettimeofday (&tv, NULL);
    return tv.tv_sec*1000 + tv.tv_usec/1000;
    }

    int main (void)
    {
    int i;
    int start, end;
    unsigned long long k;
    double k2;

    start = timer();
    for (i = 0; i < LOOPS; ++i);
    end = timer();
    printf ("Empty loop: %5dms\n", end - start);

    start = timer();
    for (i = 0; i < LOOPS; ++i)
    k = KISS;
    end = timer();
    printf ("Kiss64 (int): %5dms\n", end - start);

    start = timer();
    for (i = 0; i < LOOPS; ++i)
    k2 = (KISS) / ((double)0xFFFFFFFFFFFFFFFFULL);
    end = timer();
    printf ("Kiss64 (dbl): %5dms\n", end - start);

    start = timer();
    for (i = 0; i < LOOPS; ++i)
    alvo_x = (alvo_s + i * alvo_x) - (int) (alvo_s + i * alvo_x);
    end = timer();
    printf ("Alvo (cast): %5dms\n", end - start);

    return 0;
    }
    -----------------
     
    Vermute, Aug 26, 2009
    #8
  9. Ribeiro Alvo

    Ralf Guest

    Thx,
    but how can I use time(0) to randomly initialize the seed correctly?
     
    Ralf, Aug 26, 2009
    #9
  10. Ribeiro Alvo

    Herman Rubin Guest

     
    Herman Rubin, Aug 26, 2009
    #10
  11. Ribeiro Alvo

    root Guest

    In your example there are only two possible values. In the
    prng there are at least 10**15 values. Finding two identical
    values in some [small] number of tries implies non-randomness.
     
    root, Aug 26, 2009
    #11
  12. Ribeiro Alvo

    Vermute Guest

    Vermute, Aug 26, 2009
    #12
  13. Ribeiro Alvo

    Ken Butler Guest

    Yes. The original poster said that *if* two random numbers at
    different points in a long sequence were identical, *then* the
    preceding and succeeding random numbers will be different. (This is an
    argument for lack of periodicity, at least in an infinite-precision
    computer).

    In other words, the two threesomes of numbers were cooked up to
    illustrate the point.
     
    Ken Butler, Aug 26, 2009
    #13
  14. Ribeiro Alvo

    root Guest

    OK, you are saying the two sets of three were not "observed".
     
    root, Aug 26, 2009
    #14
  15. Hardly. Let's say you are on a platform with 32 bit ints. Your internal state is 32 + 32 bit long (the state variable x, and the
    integer n), which means that you have *at most* 2^64 distinct states of the generator. Since your generator uses a deterministic
    algorithm to compute the next number from the last, the algorithm *must* run into a cycle at least after 2^64 iterations - simply
    because there are no more states left it hasn't visited yet. And as soon as a single state is visited for a second time, the
    sequence repeats all over since it is deterministic.
    Wrong. It will provably repeat after at least 2^64 cycles. Probably earlier.

    Anyhow. I suggest to read:

    Donald Knuth: The Art of Programming, Volume II: Numerical and Seminumerical algorithms.

    Specifically, read the part about Knuth's "Home made random generator". Both entertaining and enlightening. The punchline is:

    "A good random generator should not be generated at random."


    Greetings,
    Thomas
     
    Thomas Richter, Aug 28, 2009
    #15
  16. Ribeiro Alvo

    Phoenix Guest


    Thomas I dont agree with you.

    I write this in the page:

    Periodicity
    Because (n) is an integer growing, there are non chances to repeat
    cycles. In case of (x)=0 or 1, the next (x) will be equal to the (s).
    And the another next (x) it will be different, because (n) increase.
    Here an example to explain that. Take two different small sets from
    the whole sequence with one repeated number on each set, as follows:

    x_n = 0.234546748574634
    x_n+1= 0.783564763234561
    x_n+2= 0.674863564797320

    x_m= 0.356464763212724
    x_m+1= 0.783564763234561
    x_m+2= 0.098743643223776


    If R is the repeated number, B = frac(s+nR), is the next number after
    R on the first set, and C = frac(s+mR), is the next number after R on
    the second set; for B=C, must frac(s+nR)=frac(s+mR), or n=m and that
    is impossible. Then with a different value for C, we must begin a new
    period. Imagine that we want to generate 2**53 bit random numbers.
    After, and if exhaust the whole 253 bit, we can still have different
    permutations of numbers and don’t find any repeated periods, again,
    because (n) is growing. That’s like as a box with 2**53 different
    numbers, and we pick one, next we pick another one, and so on until
    the last number. With a 2**53 numbers set, we can have 9007
    199254740992! permutations of different cycles/periods. But the
    algorithm can repeat the numbers, and because that, the value of the
    cycles is bigger. If you want to do sets of 10, you have 2**530
    different periods, on sets of 100 you have 2**5300 different periods.
    For a set of ‘n’ you have 2**53n different periods. In this sense, I
    can say this prng is ‘only’ limited by the value of ‘n’ and by
    floating point precision, truncation and/or rounding.

    Read the Ken Butler post, and the page.

    Thanks
     
    Phoenix, Aug 28, 2009
    #16
  17. Why are a so large lot of intelligent people spending (loosing) time with a so lousy Random Number Generator?


    Luis A. Afonso
     
    Luis A. Afonso, Aug 30, 2009
    #17
  18. Too bad, since all this is elementary logic.

    Which is wrong. Which of the above arguments didn't you understand?
    Except that n does have a limited precision. It will wrap around at some point.

    So, here again, is the argument in simple to follow steps (if you don't understand any of these
    steps, let me know).

    o) The internal state of the algorithm consists of a number n and a number x_n, both
    represented with a final number of bits (say 32 bit each, in the following).

    o) There are 2^64 different states in the state space (namely the possible values of x and n).

    o) The algorithm is a mapping of the set of states of the machine onto itself, that is,

    (y,m) = Phi(x,n) = (frac(s+n*x),n+1)

    This is a mapping from the set of 2^64 elements into itself.

    o) Consider the orbit of Phi for a given starting point (x_0,1), i.e. the set of points
    Phi^M(x_0,1) = the set of points created by the generator starting from (x_0,1). Phi^M means "apply the iteration
    M times on the input". The iteration is the random generator.

    o) Since there are only finitely many different states, there are M1 and M2, M1 != M2, such that
    Phi^{M1}(x_0,1) = Phi^{M2}(x_0,1), i.e. at one point, Phi will output the same numbers.

    Note that Phi encodes *both* the random number *and* the value of n!
    Thus, by incrementing n only by one each time, M1 and M2 must differ by multiplies of 2^32.

    o) Since Phi is only a permutation of the 2^64 elements, Phi will reach after at most 2^64 cycles one of
    the states it already visited before, that is, M1 and M2 differ by at most by 2^64. Note again that
    a state is a collection of a random variable x, and the counter n.

    o) Now that we have seen that M1 and M2 exist, note that Phi^{M1+k} = Phi^{M2+k} for all k >= 0. This is simply because
    Phi is, from steps M1 and M2 on, working on the same input states, thus the above follows by induction.
    And since the output only depends on the input by the definition of Phi, outputs are again identical from that point on.

    o) Ergo, Phi has a period, starting (at least) at M1 and with length of (at least) M2-M1.

    In fact, this is a special case of a general theorem, every self-mapping of a system with a finite number
    of states *must* run into a period (with the constant cycle being a period of length 1) after a while, and the period is
    (at most) the size of the set the map operates on.

    There is nothing mystical about it. But the statement that the random generator cannot create cycles is simply wrong, and
    it cannot have a period longer than 2^64.

    If you don't believe that, use a toy-example of a four-bit version of the above, i.e. let x have states 0,1,2,3 and n
    values 0,1,2,3 and try yourself. You then don't have to wait so long for the states to repeat, after at least 16 iterations you're
    back at a point where you have been before already. Not necessarily the starting point, though, but some point you've visited already.

    So long,
    Thomas
     
    Thomas Richter, Aug 30, 2009
    #18
  19. Why are a so large lot of intelligent people spending (loosing) time with a so lousy Random Number Generator?
    **************************
    2^64 distinct states of the generator!!! How uninformative/ misleading is such a rough theoretical value = 0.2E20.
    Let us check the uniformity of 1E6 random number values distributed among k= 1001 (from 0 to 1000) classes, each one with amplitude 0.001. Such a sample should follow a Chi-square Distribution with 1000 degrees of freedom the 99.99 percentile = 1174.93.
    These examples are sufficient to classify the RGN as horrendous (Program PUFF):
    ____Chi-2_________min. content____max.content
    ___13.15E6__________11____________41627___
    ___10.94E6__________16____________40634___
    ___17.26E6__________10____________58883___
    ___10.10E6__________10____________37656___
    ____9.11E6__________ 7____________35357___
    Comparing with the built-in QBASIC
    ___1407____________493_____________1112___
    ___1371____________505_____________1093___
    ___1408____________507_____________1099___
    ___1381____________497_____________1100___
    ___1422____________501_____________1097___
    Comment: In fact not a high quality generator; however it did provide good results (particularly in what concerns to simulate normal samples) these last four years.
    Any critics relative to my action (opinion/ statements/ solutions) in this news are really welcomed if in intention to enlighten, discuss and improve the maters to which we are committed.


    Luis A. Afonso


    REM "puff"
    CLS
    DEFDBL S, X
    DEFLNG N, Q
    DIM h(1000), chi(1000)
    q = 1000000: REM one million
    seed = TIMER - INT(TIMER)
    s = seed / 7
    mxxu = 0: ixu = 10 ^ 10
    FOR n = 1 TO q: w = s + n * x
    x = w - INT(w)
    xu = INT(1000 * x + .5)
    h(xu) = h(xu) + 1
    NEXT n
    FOR j = 0 TO 1000
    chi(j) = (h(j) - q / 1000) ^ 2 / (q / 1000)
    PRINT USING "#### ######## "; j; h(j);
    IF h(j) > mxxu THEN mxxu = h(j)
    IF h(j) < ixu THEN ixu = h(j)
    chi = chi + chi(j)
    NEXT j
    PRINT
    PRINT chi
    PRINT ixu, mxxu
    END


    REM "puff0"
    CLS
    PRINT " puff0 "
    DIM h(1000)
    q = 1000000
    chi1 = -100: chi2 = q * 1E+20
    RANDOMIZE TIMER
    FOR rpt = 1 TO 20
    FOR t = 0 TO 1000: h(t) = 0: NEXT t
    FOR n = 1 TO q
    x = INT(RND * 1000 + .5)
    h(x) = h(x) + 1
    NEXT n
    FOR j = 0 TO 1000
    chi = chi + (h(j) - q / 1000) ^ 2 / (q / 1000)
    NEXT j
    PRINT USING "####.#### "; chi;
    IF chi < chi2 THEN chi2 = chi
    IF chi > chi1 THEN chi1 = chi
    chi = 0
    NEXT rpt
    PRINT USING "#######.#### "; chi2; chi1
    END
     
    Luis A. Afonso, Aug 30, 2009
    #19
  20. Ribeiro Alvo

    Phoenix Guest

    Luis A. Afonso

    Your code forgotten to put the variable (w) as double precision.

    DEFDBL S,X,w

    This makes much difference.

    Or, if you prefer use

    x=s + n * x-INT(s + n * x)

    without the variable x
     
    Phoenix, Aug 31, 2009
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.