# Jarque-Bera test: confidence intervals for normal data

Discussion in 'Scientific Statistics Math' started by Luis A. Afonso, Mar 7, 2007.

1. ### Luis A. AfonsoGuest

You are stupid, even for a Biologist, Jack.

As I stressed before (at the FIRST post of this THREAD, Mars 7) the values I find out (by Monte Carlo simulation) are the 95% and 99% fractiles of the JB (Jarque -Bera) statistics. It allows testing the Hypothesis H0: is the sample from a normal Distribution, against H1 (Ha), the sample is no-normal.
If you had noted before JB is never negative because is a sum of squares (multiplied by constants), consequently the CONFIDENCE INTERVALS are [0, fractile (1-alpha)], with the two levels of significance, alpha =0.05 and 0.01.

If you are so SURE of you are saying you should criticize openly, as you do for me, the authors of

1) Jarque - Bera Test and its Competitors for Testing Normality, Thorsten Thadewald and Herbert Buning (March 14, 2004),
2) Precise finite-sample quantiles of the Jarque-Bera adjusted Lagrange multiplier test, Diethelm Wurtz and Helmut G. Katzgraber (August 2005).

IS IT A DEAL? Letâ€™s see how your guts are!!!!!!!

__________licas (Luis A. Afonso)

Luis A. Afonso, Mar 10, 2007

2. ### Jack TomskyGuest

You are stupid, even for a Biologist, Jack.

These are not confidence intervals because then every sample would have the same confidence interval.

Jack

Jack Tomsky, Mar 10, 2007

3. ### Luis A. AfonsoGuest

*** These are not confidence intervals because then every sample would have the same confidence interval. Jack ***
... and consequently, facing the Population of normal N(0,1) random samples of size n, you deny that is expected that 95% of the means are in the interval
______[- 1.960 / sqrt (n) , 1.960 / sqrt (n)]
Isnâ€™t?

Facing the Wikipedia definition:

*** In statistics, a confidence interval (CI) for a population parameter is an interval between two numbers with an associated probability p which is generated from a random sample of an underlying population, such that if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same method, a proportion p of the confidence intervals would contain the population parameter in question.***

What, IN SIMULATION terms, the algorithm is?
Is to synthesize a great number (2 millions) of n sized samples, to evaluate the test statistics (JB for example), to memorize its values and then from the empirical distribution to evaluate the fractiles. ALL FREQUENCIST STATICIANS (like me) are very comfortable with this procedure and since the middle of the 60Â´s to now there are millions of papers in this context.
Are you so *brave* to state that are all WRONG?
If so, fight them and let me in peace and quite , I am not even an *atom* in this CROWD. In fact IÂ´m working for you: I did point out at least 4 teams that followed the procedure I used. : GO AND EAT THEM! If you are interested IÂ´ll give you the names a full battalion, even an army !!!!!!!!!.

_________licas (Luis A. Afonso)

Luis A. Afonso, Mar 10, 2007
4. ### Jack TomskyGuest

*** These are not confidence intervals because then

Under the Afonso Theory of Statistics, all confidence intervals are the same and are independent of the sample. Any information contained in the sample is ignored. Even your Wikipedia reference says that confidence intervals depend on the sample.

Similarly, under the Afonso Theory of Statistics, one is never allowed to accept the hypothesis that 8/13 is greater than 5/13.

Jack

Jack Tomsky, Mar 11, 2007
5. ### Luis A. AfonsoGuest

I do hesitate to classify you:
_________a clown?
(I discard the hypotheses he is stupid).

MEANWHILE
I have the reward to find out, and show it with pleasure to the Readers, that with a very poor tool (computer) and scarce programming skills, everyone can *replicate* results that are both *educative* and *exact*. As exact as that are the same we read on text-books and tables.

CONCLUSION
___*Tout le monde est en erreur sauf Jack Leon Tomsky*. What an odd thing!!!!!!!!

________licas (Luis A. Afonso)

Luis A. Afonso, Mar 11, 2007
6. ### Jack TomskyGuest

I do hesitate to classify you:

There are no books in any language which give confidence intervals independent of the sample. I challenge you to find any book which does this.

Jack

Jack Tomsky, Mar 11, 2007
7. ### Luis A. AfonsoGuest

Jack:

The critical (WRONG) idea NOT TO ADMIT that simulated samples are SAMPLES by its own right put you at a so PECULIAR situation that you are compelled to deny all the work that has been made since H. LILLIEFORS (1967) p to now..
I repeat
Put your comments and critics (you can use the title *Against the wrong way the scientists use to find critical values or simulated samples are not samples* and try to publish it, if you are sufficiently persuaded you are RIGHT.
I dare if it was a Refereeâ€™s team of a serious Journal to admit such a *trash*. Second time I invite you to do so.
(Prenez bien garde : Tout le monde est en erreur sauf moi, cÂ´est un symptÃ´me de folie, ou de GENIE).
YOU HAVE TO CHOOSE : OR YOU ARE A CRAZY OLD MAN OR YOU ARE SO GENIAL THAT YOU GO TO LEAD A REVOLUTION IN STATISTICS. (La choix est a vous).
MEANWHILE I appreciate you did not disturb my work with your PECULIAR idea , not accepted by statisticians, of what are genuine, truthful, samples and simulated, false, ones.
IS IT A DEAL?
_________licas (Luis A. Afonso)

Luis A. Afonso, Mar 11, 2007
8. ### Jack TomskyGuest

Jack:

For years now, Wikipedia and I have been trying in vain to teach you about confidence intervals, hypothesis tests, and the difference between parameters and sample statistics. You still maintian that confidence intervals are independent of the sample, that the null hypothesis can never be accepted, and that we can never know if 8/13 > 5/13.

What is it about you that you're incapable of learning? I will continue to correct your errors in the forum until you get it right.

Jack

Jack Tomsky, Mar 11, 2007
9. ### Luis A. AfonsoGuest

IF we are able to deduce, by the first principles, the mathematical expression of a sample statistics Distribution the quantiles (say 5% - 95% , 1% - 99%) provide us of the critical values and therefore the acceptance intervals for the parameters in study throughout the Hypotheses test,
If we are not able we can (possibly) simulate the sample statistics a sufficient high number o times (for example 2 million) and from this *population* to evaluate the critical values (concerning a pre-defined alpha). The number of simulations is directly connected with the *precision* the critical value is obtained.
Cft.
*How many replications in Monte-Carlo replications?
____V.K. Stokes.

_______licas (Luis A. Afonso)

Luis A. Afonso, Mar 11, 2007
10. ### Jack TomskyGuest

IF we are able to deduce, by the first principles,

This quote from V.K. Stokes has nothing to do with confidence intervals, the subject of your thread.

Jack

Jack Tomsky, Mar 11, 2007
11. ### Luis A. AfonsoGuest

wwwpub.utdallas.edu/~herve/Abdi-Lillie2007-pretty.pdf

UUUUUUUUUUUUUUUUUUUUUUUUUUUUU

MY VALUES

size___alpha=5%_1% ___Connover_____Abdi.
_10___0.264___0.305___.258_.294__.2616_.3037
_15___0.220___0.255___.220_.257__.2196_.2545
_20___0.192___0.224___.190_.231__.1920_.2226
_25___0.174___0.202___.173_.200__.1726_.2010
_30___0.159___0.185___.161_.187__.1590_.1848
_35___0.148___0.172_____________.1478_.1720
_40___0.139___0.161_____________.1386_.1616
_45___0.131___0.152_____________.1309_.1525
_50___0.124___0.145_____________.1246_.1457

(for each sample size, 500Â´000 samples were simulated
by my work, 100Â´000 by Abdi & Molin).

JACK TOMSKY is so unlearned and shameless that deserves to be exposed every time he posts an opinion on Hypotheses Tests. The less experience people should take in attention that HE IS A CLOWN.

Readers. Do appreciate what I found out at WEB.

*** Lilliefors/Van SoestÂ´s test of normality ***

1. OVERVIEW

The normality assumption is at the core of the majority of standard statistical procedures, and it is important to be able to test this assumption. In addition, showing that a sample does not come from a normally distributed population is sometimes of importance per se. Among the many procedures used to test this assumption, one of the most well-known is a modification of the Kolmogorov-Smirnov test of goodness of fit, generally referred to as the Lilliefors test for normality (or Lilliefors test for short).This test was developed independently by Lilliefors (1967) and by Van Soest (1967). The null hypotheses for this test is that the error is normally distributed (i.e. there is no difference between the observed distribution f and the normal distribution). The alternative hypotheses is that the error is not normally distributed.
Like most statistical tests, this test of normality defines a criterion and gives its sampling distribution. When the probability associated with the criterion s smaller than a given [alpha]-level, the alternative hypotheses is accepted (i.e. we conclude that the sample does not come from a normal distribution). An interesting peculiarity of the Lilliefors test is the technique used to derive the sampling distribution of the criterion. In general mathematical statisticians derive the sampling distribution of the criterion using analytical techniques. However in this case, this approach fails and consequently Lilliefors decided to calculate an approximation of the sampling distribution by using the Monte Carlo technique.
Essentially the procedure consists of extracting a large number of samples from a Normal Population and computing the value of the criterion for each of these samples. The empirical distribution of the values o the criterion gives an approximation of the sampling distribution of the criterion under the null hypotheses.
Specifically, both Lilliefors and Van Soest used, for each sample size chosen, 1000 random samples derived from a standardized normal distribution to approximate the sampling distribution of a Kolmogorov-Smirnov criterion of goodness of fit. He critical values given by Lilliefors and Van Soest are quite similar, the relative error being of the order of 10^ (-2).
According to Lilliefors (1967) this test of normality is more powerful than others procedures for a wide range of nonnormal conditions. Dagnelie (1968) indicated, in addition, that the critical values reported by Lilliefors can be approximated by an analytical formula. Such a formula facilitates writing computer routines because it eliminates the risk of creating errors when keying in the values of the table. Recently, Molin and Abdi (1998), refined the approximation given by Dagnelie and computed new tables using a larger number o runs (i.e. K=100,000) in their simulations. ***
(End of citation).

____TOMSKYÂ´s ABSOLUTELY K.O.!!!!!

____licas (Luis A. Afonso)

Luis A. Afonso, Mar 12, 2007
12. ### Jack TomskyGuest

wwwpub.utdallas.edu/~herve/Abdi-Lillie2007-pretty.pdf

Although there is no evidence that anyone has ever used any of Afonso's faulty statistics, it is important that his errors be corrected so that no one will ever think that confidence levels and significance levels are synonomous, that null hypotheses are never allowed to be accepted, and that no one can tell if 8/13 is greater than 5/13.

Jack

Jack Tomsky, Mar 12, 2007
13. ### Luis A. AfonsoGuest

YES, I REPEAT MY STATEMENT

ÂªÂªÂªÂªâ€¦ allowed to be accepted, and that no one can tell if 8/13 is greater than 5/13.***

At the proper CONTEXT I never denied: it.
The NOTATION *a/b* I adopted with a precise meaning: that are * a * successes in * b * trials. I wasted my time to write a full thread putting this clear. You unethically and your *boss* Bob Ling, read intentionally (in order to attack me) the notation as plain fractions!!! And you are repeating ad nausea with the same purpose.
When I did write (as you say) 8/13 > 5/13 a unique way of interpretation: is valid: comparing the event 8 successes in 13 trials with 5/13 we can state that the latter is less favorable (to successes) at alpha significance level.
(I do not remember exactly, but I think that was 5%)
IS IT THE LAST TIME YOU USE THIS *TRASH* TO BULLING ME? IS IT?

_________licas (Luis A. Afonso)

Luis A. Afonso, Mar 12, 2007
14. ### Luis A. AfonsoGuest

In a DOZEN of posts, Jack Tomsky, wanted to stop my job to find out the critical values of the Jarque-Bera test. He faced the drawback not be successful
Meanwhile two points were obvious
1) The total ignorance of the existence of this test (showing very weak awareness to be updated, even from Webâ€™s material). This test is known since 1980.
2) The most serious: a 40 years technique ignorance to reach confidence intervals (by simulation).
In his *opaque* mind
____a confidence interval is only possible to be obtained throughout a real sample and it is unique.
Consequently, for him, the procedure:
a) Simulating samples a great number (1 million),
b) For each of them evaluating the sample statistics under study,
c) And from this empirical distribution to get the quantiles of interest for the test
is WRONG, ABUSIVE, CONDENABLE.

This procedure, since 1967 through H. Lilliefors, is currently used for the goal in view.
To ignore it nowadays is ABSOLURELY NOT ACCEPTABLE to statistically learned people.

_______licas (Luis A. Afonso)

Luis A. Afonso, Mar 13, 2007
15. ### Luis A. AfonsoGuest

Test J-B: POWER for exponential samples

Conventionally *beta* is used to denote the probability to make a type II error (i. e. to accept the hypotheses H0 when we should not).
*** The power, 1-beta, is the probability to reject H0 when we should do it.***
When we are dealing with a GOF test (goodness of fit) the null hypotheses is H0: the sample was drawn from the Population of law W. The power is the probability to reject H0 when this is true, i.e., when the population has a law different from W, therefore when the alternative hypotheses, Ha, occurs.
This tine we test random samples from the exponential law of density
_____ f(x) = (1 / L)*exp(-x / L) ___ L real positive
0 <= x < infinite.

For alpha=5% exponential samples (L=1):

__N______________Power
__10______________0.332__
__15______________0.496__
__20______________0.631__
__25______________0.734__
__30______________0.821__
__35______________0.884__
__40______________0.928__
__45______________0.957__
__50______________0.977__

(Note: the powers doesnâ€™t vary with L)

_______licas (Luis A. Afonso)

REM "JBexp"
CLS
DEFDBL A-Z
PRINT " JB test for exp. distr. "
INPUT " LAMBDA = "; lbd
INPUT " sample size = "; nn
DIM w(1, 50)
DATA 2.54,2.71,2.87,3.02,3.16,3.29,3.41,3.52
DATA 3.62,3.72,3.81,3.89,3.96,4.03,4.09,4.15
DATA 4.21,4.26,4.31,4.36,4.40,4.44,4.48,4.52
DATA 4.56,4.59,4.62,4.66,4.69,4.72,4.74,4.77
DATA 4.80,4.82,4.85,4.87,4.89,4.91,4.92,4.94
DATA 4.95
FOR t = 10 TO 50: READ w(1, t): NEXT t
jc = w(1, nn)
PRINT jc
DIM x(nn)
all = 40000
FOR k = 1 TO all
LOCATE 4, 50
PRINT USING "##########"; all - k
s = 0
RANDOMIZE TIMER
FOR i = 1 TO nn: x(i) = 0
x(i) = -1 / lbd * LOG(1 - RND)
s = s + x(i) / nn
NEXT i
m1 = s: m2 = 0: m3 = 0: m4 = 0
FOR j = 1 TO nn: d = x(j) - m1
m2 = m2 + d * d / nn
m3 = m3 + d * d * d / nn
m4 = m4 + d * d * d * d / nn
NEXT j
SK = m3 / (m2 ^ (1.5))
Ku = m4 / (m2 * m2)
JB = (nn / 6) * (SK * SK + (Ku - 3) * (Ku - 3) / 4)
IF JB > jc THEN ww = ww + 1
LOCATE 6, 50
PRINT USING "#.###"; ww / k
NEXT k

Luis A. Afonso, Mar 13, 2007
16. ### Luis A. AfonsoGuest

J-B test, POWER for Chi-square

From Wikipedia:

*** The power of a statistical test is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). As power increases, the chances of a Type II error decrease, and vice versa. The probability of a Type II error is referred to as *beta*.
Statistical power depends on:
a)__the statistical significance criterion used in the test
b)__the size of the difference or the strength of the similarity (that is, the effect size) in the population ***
____________________________

TABLE
Jarque - Bera normality test , 5% significance level: POWER for Ch-squared Distributions, df degrees of freedom.

______df=3_______5_______7_______10__
N=
__10__0.251____0.176____0.146____0.117_
__20__0.492____0.351____0.276____0.216_
__30__0.687____0.500____0.396____0.313_
__40__0.821____0.635____0.506____0.394_
__50__0.913____0.746____0.617____0.480_
_____________________________________

For each distribution (column) the power increases from N=10 to 50, whereas for each line (N constant) it decreases when the dg increases because the Chi-squared distributions are progressively more alike to normal one. Forthis reason the Jarque - Bera seems to be progressively less able to distinguish them.

_________licas (Luis A. Afonso)

REM "JBchi"
CLS
DEFDBL A-Z
PRINT " JB test for CHI "
INPUT " sample size = "; nn
INPUT " df = "; df
pi = 4 * ATN(1)
DIM w(1, 50)
DATA 2.54,2.71,2.87,3.02,3.16,3.29,3.41,3.52
DATA 3.62,3.72,3.81,3.89,3.96,4.03,4.09,4.15
DATA 4.21,4.26,4.31,4.36,4.40,4.44,4.48,4.52
DATA 4.56,4.59,4.62,4.66,4.69,4.72,4.74,4.77
DATA 4.80,4.82,4.85,4.87,4.89,4.91,4.92,4.94
DATA 4.95
FOR t = 10 TO 50: READ w(1, t): NEXT t
jc = w(1, nn)
DIM x(nn)
all = 40000
FOR k = 1 TO all
LOCATE 4, 50
PRINT USING "##########"; all - k
s = 0
RANDOMIZE TIMER
FOR i = 1 TO nn: x(i) = 0
FOR dgg = 1 TO df
a = SQR(-2 * LOG(RND))
x = a * COS(2 * pi * RND)
x(i) = x(i) + x * x
NEXT dgg
s = s + x(i)
NEXT i
m1 = s / nn: m2 = 0: m3 = 0: m4 = 0
FOR j = 1 TO nn: d = x(j) - m1
m2 = m2 + d * d / nn
m3 = m3 + d * d * d / nn
m4 = m4 + d * d * d * d / nn
NEXT j
SK = m3 / (m2 ^ (1.5))
Ku = m4 / (m2 * m2)
JB = (nn / 6) * (SK * SK + (Ku - 3) * (Ku - 3) / 4)
IF JB > jc THEN ww = ww + 1
LOCATE 6, 50
PRINT USING "#.###"; ww / k
NEXT k: END

Luis A. Afonso, Mar 14, 2007
17. ### Luis A. AfonsoGuest

JB by Bootstrap: reporting a failure

The procedure

From an unique normal N sized *source sample* a set of B Bootstrap samples are simulated (with the same size) and the JB statistics evaluated.
Analysing this set I count how many these *pseudo-samples* have JBÂ´s greater than the 5% significance level critical value. This frequency of this occurrence is the *Bootstrap* significance level. (BSL).
_______________________________________
size = 10
100 *sources* each one Bootstrapped 4000 times ____values from 9% to 81%, mode = 15% , with 8 occurrences.

size = 20
idem
____values from 6% to 84%, mode = 8% with 12 occurrences.

size = 30
idem
____values from 5% to 73%, mode = 6% with 12 occurrences.

size = 40
idem
____values from 5% to 100%, mode = 6% with 15 occurrences.

size = 50
idem
____values from 5% to 99%, mode = 8% with 13 occurrences.

_______________________________________
size = 10
100 *sources* each one Bootstrapped 10000 times____values from 8% to 79%, mode = 12%-13%, with 9 occurrences each.

Conclusion
The Bootstrap doesnÂ´t work for the Jarque-Bera test.

________licas (Luis A. Afonso)

Luis A. Afonso, Mar 15, 2007
18. ### Luis A. AfonsoGuest

Significance level, alpha, by CDF

Acceptance (no-rejection) interval,
right bounded: interval (-infinity, b] such that
________1- alpha = F(b)

The rejection is (b, infinity) defined by
________ alpha = 1- F(b) - p(X=b)

This way to define them is the same the density f(X) be continuous, i.e. p(X=b)=0 or has at *b* a discontinuity.

Luis A. Afonso, Mar 16, 2007
19. ### Jack TomskyGuest

Significance level, alpha, by CDF

What happens if there is no b such that 1- alpha = F(b)? Then your acceptance region is undefined.

Jack

Jack Tomsky, Mar 16, 2007
20. ### Luis A. AfonsoGuest

Jack Tomsky wrote:

*** What happens if there is no b such that 1- alpha = F(b)? Then your acceptance region is undefined. Jack ***

My response

It seems to me *homework* *master* Jack. You must tell what you got yet in this matter. Ask your teacher to direct you in the right way.

________licas (Luis A. Afonso)

Luis A. Afonso, Mar 16, 2007