# Help with probability&stat problem

Discussion in 'Scientific Statistics Math' started by tutorny, Jun 11, 2007.

1. ### Jack TomskyGuest

My response

My calculation of the probability as 0.9753 was exact, while your calculation of 0.9786 was a mediocre approximation. The professor would have given me an A for my answer and given you an F for your answer.

Jack

Jack Tomsky, Jun 12, 2007

2. ### Luis A. AfonsoGuest

Binomial law DF: a simple program

N=110, p=0.2

________p(X<=110) = 1.0000000000
________p(X<=10) = 0.0015595228
________p(X<=22) = 0.5567271744
________p(X<=30) = 0.9752864841
________p(X<=50) = 0.9999999996
________
________Licas

REM "BINcum"
CLS
DEFDBL A-Z
PRINT " F(a) = p(X<=a) X=Bin(p, N) "
INPUT " p , N "; p, n
INPUT " a ( a<=n ) "; a
w = p / (1 - p)
DIM pp(n)
pp(0) = (1 - p) ^ n: s = pp(0)
IF a = 0 THEN GOTO 10
FOR j = 0 TO n - 1
IF j > a - 1 THEN GOTO 10
pp(j + 1) = pp(j) * (n - j) / (j + 1) * w
s = s + pp(j + 1)
NEXT j
10 LOCATE 10, 50: PRINT USING "#.########## "; s
END

Luis A. Afonso, Jun 13, 2007

3. ### Jack TomskyGuest

Binomial law DF: a simple program

Now for the same cases, compare these simple exact results with the Afonso normal approximation employing his continuity correction.

p(X<=110)= 1.0000000000
p(X<=10) = 0.0030607156
p(X<=22) = 0.5474347428
p(X<=30) = 0.9786231409
p(X<=50) = 1.0000000000

The conclusion is that it is better to use the simple exact binomial calculations.

Jack

Jack Tomsky, Jun 13, 2007
4. ### Luis A. AfonsoGuest

EXACT : p(X<=10) = 0.00156

Normal Approximation

*** Without C.C.
Z = (10 - 22)/sqrt(0.2*0.8*110) = -22/ 4.1952â€¦= -2.8604. _________________ F(Z) = 0.00212
*** With C.C.
________Z = (10 - 22 + 0.5)/ 4.1952â€¦ = -2.9796
________________________ F(Z) = 0.00144

The differences approx-EXACT are respectively

_______Without = +0.00056
_______With C.C. = -0.00012

... and all Jack TomskyÂ´s argumentation falls on earth.
*********************************************

Licas

Luis A. Afonso, Jun 13, 2007
5. ### Anon.Guest

exact test, and made the comparison to the approximation that the OP had
used. Luis has used a different approximation, and come to a slightly

But you both appear to agree on the exact calculation.

Bob

--
Bob O'Hara

Dept. of Mathematics and Statistics
P.O. Box 68 (Gustaf HÃ¤llstrÃ¶min katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: http://www.jnr-eeb.org

Anon., Jun 13, 2007
6. ### Luis A. AfonsoGuest

1) The discussion was about if C.C. worsen the results comparing with the raw calculation. IT DID NOT.
2) The two normal approximations have rather different accuracies (they ARE NOT slightly different contrarily Bob said).
______Relative errors (absolute values):
______ Not using C.C.___ (282-156)/156 = 35.9%
_________ using C.C.___ (144-156)/156 = 7.7 %
Therefore
______this last one procedure improves almost FIVE TIMES the result. To use it is an indisputable MUST.
**********************

Licas

Luis A. Afonso, Jun 13, 2007
7. ### Jack TomskyGuest

EXACT : p(X<=10) = 0.00156

(10-22+0.5)/4.1952 = -2.7412, not 2.9796. Thus F(z) = 0.00306, not 0.00144.

Jack

Jack Tomsky, Jun 13, 2007
8. ### Jack TomskyGuest

The exact probablity calculated from the binomial distribution is 0.0015595.

The Afonso normal approximation is 0.002115.

The "improved" Afonso normal approximation with a correction factor is 0.003061, which is worse.

Jack

Jack Tomsky, Jun 13, 2007
9. ### Anon.Guest

I'll leave it to others to decide whether 0.9717 and 0.9786 are more
than slightly different,

______Relative errors (absolute values): ______
But i find this curious: are you really saying that one must use the
Normal approximation with a continuity correction, rather than the exact
binomial calculation? If so, why? And why can't this be disputed?

Bob

--
Bob O'Hara

Dept. of Mathematics and Statistics
P.O. Box 68 (Gustaf HÃ¤llstrÃ¶min katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: http://www.jnr-eeb.org

Anon., Jun 13, 2007
10. ### Jack TomskyGuest

Bob, what this shows is that with a p of 0.20, the binomial distribution is too assymetrical to be effectively approximated by a symmetrical normal distribution, either with a mean of 22 or 22.5. The Poisson would probably give a much better approximation.

Jack

Jack Tomsky, Jun 13, 2007
11. ### Luis A. AfonsoGuest

Bob wrote:

*** I'll leave it to others to decide whether 0.9717 and 0.9786 are more than slightly different, ***

My response

You SHOULD NOT leave to others but to THINK ABOUT. The way I posted is the *one* when the signal o the error I unimportant.
You made an error of analysis: it is not throughout the Z values that the errors must be chosen but by the associated tail probabilities, evidently.
*********************

Bob:

*** But i find this curious: are you really saying that one must use the Normal approximation with a continuity correction, rather than the exact binomial calculation? If so, why? And why can't this be disputed? ***

My response

OF COURSE NOT: I didnâ€™t and Iâ€™ll never say such nonsense.

The reason by which the normal approximation is used is that is immediate. On contrary the exact way needs computer programming. If available, do not hesitate: use it. If not, the only *decent* way is to use the continuity correction.

IT`S SIMPLE, ISN`T IT?
**********************

Licas

Luis A. Afonso, Jun 13, 2007
12. ### Jack TomskyGuest

For the binomial distribution calculations in Excel, you don't need to know BASIC programming. You just use the BINODIST function. It takes about 20 seconds to type in the arguments.

The normal distribution, based on NORMSDIST, takes longer because you have to calculate the argument (x-pN+0.5)/sqrt(p*(1-p)*N). What you achieve with the added time consumed is an inaccurate approximation of an assymmetric distribution by a symmetric distribution.

Jack

Jack Tomsky, Jun 13, 2007
13. ### Jack TomskyGuest

I think that it is indecent to approximate the exact Prob(X <=10) of .001560 by a normal approximation of 0.002115 and then to apply a correction factor to make it even worse at 0.003061.

Jack

Jack Tomsky, Jun 13, 2007
14. ### Luis A. AfonsoGuest

Follow-up, in SHORT, the continuity correction leads to better results:
______With C.C._____EXACT_______Without C.C.
______0.00144______0.00156________0.00212____
Diff___0.00012_____________________0.00056____

***************
See my post Jun 13, 2007, 7:29 AM. And APPRECIATE that Jack Tomsky DID FALSIFY my evaluation

I WROTE (ipsis verbis, please check):
Date: Jun 13, 2007 7:29 AM
Author: Luis A. Afonso
Subject: Re: Help with probability&stat problem

EXACT : p(X<=10) = 0.00156
Normal Approximation
*** Without C.C.
Z = (10 - 22)/sqrt(0.2*0.8*110) = -22/ 4.1952 = -2.8604.
________________ F(Z) = 0.00212
*** With C.C.
________Z = (10 - 22 + 0.5)/ 4.1952 = -2.9796
________________________ F(Z) = 0.00144
The differences approx-EXACT are respectively

_______Without = +0.00056
_______With C.C. = -0.00012.
********

JackÂ´s response:

The exact probablity calculated from the binomial distribution is 0.0015595. The Afonso normal approximation is 0.002115. The "improved" Afonso normal approximation with a correction factor is 0.003061, which is worse. Jack

*******************************
Licas

Luis A. Afonso, Jun 13, 2007
15. ### Anon.Guest

Note that you describe it as something that must be done, i.e. that
using the exact test must not be done, and don't qualify it at all. I
guess you just let your rhetoric run away with you, which is why I checked.

But there, it's all cleared up now.

Bob

--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf HÃ¤llstrÃ¶min katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org

Anon., Jun 13, 2007
16. ### Jack TomskyGuest

Follow-up, in SHORT, the continuity correction leads

I was correcting your arithmetic. The actual result with CC is 0.00306, leading to a difference of 0.00306-0.00156 = 0.00150, which is even worse than the 0.00056 error without the CC.

Do we agree that 10-22+0.5 = -11.5? Do we agree that -11.5/4.1952 = -2.7412? Do we agree that F(-2.7412) = 0.003061?

I thought that you would appreciate that I was able to check out your calculations and correct the arithmetic. Or was it your BASIC program which did the miscalculation?

If the normal distribution used for the approximation has a smaller mean of 21.5 instead of 22, then the cdf must be larger for all x. So it should have been a red flag that you would get a smaller estimate with the CC than without the CC.

Jack

Jack Tomsky, Jun 13, 2007
17. ### Luis A. AfonsoGuest

Everybody curious to learn the logic of this procedure should consult

_____Wikipedia: Continuity correction

____Licas

Luis A. Afonso, Jun 13, 2007
18. ### Jack TomskyGuest

Everybody curious to learn the logic of this

What Wikipedia failed to mention is that one could get the exact result from Excel in about 20 seconds, using BINOMDIST and inputting the three arguments plus "TRUE" to obtain the cumulative binomial. For example, in the case of p = 0.20, N = 110 and x = 10, the normal approximation overestimates the true probability and then adding the continuity correction term further exaggerates the error. It is also so complicated to apply that a typical user such as Afonso could not calculate the numbers correctly.

Jack

Jack Tomsky, Jun 14, 2007
19. ### Luis A. AfonsoGuest

The BASIC program I made, listing presented at this thread, provides INSTANTANEOUSLY the cumulative probability even if I enter X=110 (N=110, p=0.2). I wonder why to prefer the extremely slow .EXEL?. TWENTY SECONDS when X=30? WHAT AN ETERNITY!!!
*********
Licas

Luis A. Afonso, Jun 14, 2007
20. ### Luis A. AfonsoGuest

The BASIC program I made, listing presented at this thread, provides INSTANTANEOUSLY the cumulative probability even if I enter X=110 (N=110, p=0.2). I wonder why to prefer the extremely slow .EXEL?. TWENTY SECONDS when X=30? WHAT AN ETERNITY!!!
*********
IMPROOVED PROGRAM

This program (listing below) is able to evaluate the cumulative probabilities of Bin (p=0.000001, N=10^6).It spends 10 seconds (circa) to evaluate F(X=10^6) providing the value 1.0000000000D+000.

Licas

REM "BINcum"
CLS
DEFDBL A-Z
PRINT " F(a) = p(X<=a) X=Bin(p, N) "
INPUT " p , N "; p, n
INPUT " a ( a<=n ) "; a
w = p / (1 - p)
ante = (1 - p) ^ n: s = ante
IF s = 0 THEN GOTO 20
IF a = 0 THEN GOTO 10
FOR j = 0 TO n - 1
IF j > a - 1 THEN GOTO 10
post = ante * (n - j) / (j + 1) * w
s = s + post
ante = post
NEXT j
10 LOCATE 10, 50
: PRINT USING "##.##########^^^^^"; s
END
20 PRINT " p(0)=0 "

Luis A. Afonso, Jun 14, 2007