# PHP code to generate samples from a normal distribution?

Discussion in 'Probability' started by Ray Koopman, Nov 4, 2010.

1. ### Ray KoopmanGuest

The normal distribution does not have a min or max. It goes to
infinity in both directions. Do you want to sample from a normal
with the specified mean and s.d., but retain only values that are
within the range you specify? (That would usually be referred to
as sampling from a truncated normal distribution.)

Ray Koopman, Nov 4, 2010

Apologies if this isn't the best ng for this question
(redirection appreciated, if not)

I need to generate random samples from a normal distribution
given min, max, mean, and sd. Can someone point me to the
source of a function that will do that?

It can be written in just about any general-purpose computer
language. Or even a semi-raw formula would be fine, if it's
parenthesised well enough that I can tell which operators have
precedence.

Many thanks in advance for any help! (Just in case: no, I'm not
asking for homework help, I got my undergrad degree ages ago; I'm
simply hopeless at anything arithmetic)

Yes, that's what I meant --discarding all values not within a
range. I suppose I must have been focusing on my purpose, which
is to sample from possible human ages (i.e., 0..122).

Thanks very much for responding!

4. ### Ray KoopmanGuest

This is probably the easiest way to do it:

1. Generate two independent Uniform(0,1) random values, u & v.
2. Let x = mean + sd*sqrt(-2*ln(u))*cos(PI*v).
3. If min < x < max then accept x; otherwise return to step 1.

'mean' & 'sd' are the mean & s.d. of the normal distribution before
removing values that are < min or > max. The mean of the truncated
distribution will be between 'min' and 'max', and the s.d. of the
truncated distribution will be less than 'sd'. It's possible to do
things the other way around, where you specify the mean & s.d. that
you want the truncated distribution to have, and then solve for the
values of 'mean' & 'sd' to use in the code, but it would be very
messy.

I don't know about using the normal to model the age distribution.
Maybe someone who does know about such things will comment.

Ray Koopman, Nov 4, 2010
5. ### HenryGuest

A variety of age distributions can be seen at

Although there is clearly a decline at very old age, the pattern for
young and middle age varies from place to place and time to time,
without any conventional distribution being useful. A normal
certainly would not do it, especially for the young.

Henry, Nov 5, 2010
6. ### Herman RubinGuest

The Box-Muller method should use random "reals" instead
of a highly discrete scale like 0 to 99. The above distribution
would have jumps in the cdf of size .0001, and especially bad tails.

There are several better, quicker, and more accurate methods
of generating normal random variables. Johnson and I have a
paper comparing our acceptance-replacement algorithm with
Marsaglia's reverse ziggurat, which is also a good method.

If the probability of the admissible range is high, I definitely
recommend a good method of generating unrestricted normal random
variables and rejecting those out of range. If it is not, I
would recommend using acceptance-rejection or acceptance-replacement
based on the concave logarithm of density approach.

Other distributions can be simulated. Nothing is
normally distributed in nature, although some (not age)
may be close.
Numerical analysis is an art; I suggest you consult
someone with the necessary artistry. The other parts
of your system likewise are going to need good
modeling and good mathematics.

Herman Rubin, Nov 5, 2010

Thanks very much! I'll have a go with it. I don't think doing
post-hoc truncation will be a problem.

Just to make sure I understand the operator precedence, would
this be a correct re-writing?

ran1 = ran_int( 0, 99 ) / 100
ran2 = ran_int( 0, 99 ) / 100
dp = mean + (( sd * sqrt((-2) * natlog( ran1 )) * cos( pi *
ran2))
Sorry, I was misleadingly terse in my nano-explanation: it's for
simulating a system. The population is expected to be bimodal in
age, and my aim is to sample accordingly for various modal
combinations and see what that does to later interactions within
subsystems.

8. ### Ray KoopmanGuest

...............^
The precedence is right, but you have an extra (. And as Herman has
already pointed out, the uniform random values must be continuous
reals, not discrete integers. If you want the result to be an integer
then round at the end. Also, the value whose log you take must be in
(0,1]; that is, greater than 0, and less than or equal to 1.
It sounds like exact normality isn't crucial to what you're doing.

Ray Koopman, Nov 6, 2010

This is where my dyscalculia gets me into trouble. I'd take it
kindly if you or Herman would explain in what way my thinking is

Herman notes that creating reals by my intended scheme of taking
pseudo-random integers 0-99 (or 0-100) and dividing by 99 (or
100) will result in a "randomness deficit" (I'm sure there's a
more-accepted term) of 0.0001.

To me, that doesn't seem like a problem for the case, as here,
where the end result must be constrained into the integer
sub-range (e.g.) 10-100.

No matter how perfectly destributed the raw result of the
calculation, ultimately it must reduce to one of those few
integers.

Or is the formula chaotic in some way? If it's not, I'd think
that such a small deficit would, at most, produce the occasional
(e.g.) 44 instead of 43 or 45, i.e. a skew invisible to the naked
eye

What am I not understanding? (If you can explain it in simple
enough terms, that is)

And as a practical matter, would it help (since I only have
integer rands in php, I think) if I were to take integer values
from the full 2^32 range and then divide by 2^32?
Very true. I was actually trying to get by with merely
averaging the sum of several integer rand calls, fiddling with
their ranges to control central tendency, but I couldn't persuade
myself that it was producing a sufficiently natural-looking
result. So if I can get something that "looks right" to the
naked eye, I'm sure that'll do everything necessary.

Many thanks for your continued help with this!

10. ### Ray KoopmanGuest

Most "continuous real" uniform(0,1) generators actually work just
as you describe -- they generate a uniform integer 1...n, where n
is very large, then float it and divide by n or n+1. For generating
normals using the algorithm described earlier, n determines the
maximum possible magnitude that the generated standard normal can
have, which is sqrt(2 natlog(n)). Also, if n is too small (e.g., 100)
then the distribution will have a "spike" at 0. 10^4 should be OK.

Ray Koopman, Nov 11, 2010