A cow´s story

Discussion in 'Scientific Statistics Math' started by Luis A. Afonso, Dec 1, 2010.

  1. A cow´s story


    MCM (Monte Carlo Method) could be a good choice to provide Hypotheses Tests correct p-values to which only an approximate distribution of the test statistics is known.
    I’ll illustrate by solving a problem issued in Probability and Statistical Inference /Robert V. Hogg, Elliot A. concerning the Chi-square gof (goodness of fit) test proposed by K. Pearson (1900).
    Data is the fat production (in milk) by 90 cows, one intend though follows a normal (Gaussian) Distribution, H0. In order to follow the rule: to expect more than five values by class we choose to divide data in k= 15 classes, then 6 values are expected in each equal probability one. Therefore the class extremes were (-infinity, -1.5010), (-1.5010, -1.1108), . . . , (1.5010, +infinity) and distribution contents were (after ordering):
    ___2__4__4__5__5__5__5__6__6__6__6__7__7__10__12__
    Chi= 13.66(6): Chi-square, 15 - 1 - 2 = 12 degrees of freedom and p(by chi-square) = 0.104, with alpha=5% there are no sufficient evidence to reject H0.
    What happens if we do not want to approximate the test statistics and use random numbers instead simulating the class contents is that one get p = 0.150 with 400000 simulated samples. This result assures that the null hypotheses should not be rejected.

    Luis

    REM "butter1"
    CLS : PRINT : PRINT
    PRINT " Butterfat (in ponds) produced by 90 COWS"
    DEFDBL A-Z
    INPUT " all = "; all
    n = 90
    DIM x(90), L(91), h(50), inn(50)
    DATA 486,537,513,583,453,510,570,500,458,555
    DATA 618,327,350,643,500,497,421,505,637,599
    DATA 392,574,492,635,460,696,593,422,499,524
    DATA 539,339,472,427,532,470,417,437,388,481
    DATA 537,489,418,434,466,464,544,475,608,444
    DATA 573,611,586,613,645,540,494,532,691,478
    DATA 513,583,457,612,628,516,452,501,453,643
    DATA 541,439,627,619,617,394,607,502,395,470
    DATA 531,526,496,561,491,380,345,274,672,509
    FOR i = 1 TO 90: READ x(i): x = x(i): sx = sx + x / 90
    sxx = sxx + x * x: NEXT i
    mean = sx
    var = sxx - n * mean * mean: var = var / (n - 1)
    sd = SQR(var)
    PRINT USING " ###.### "; mean; sd
    FOR i = 1 TO n: x(i) = (x(i) - mean) / sd: NEXT i
    PRINT " 15 classes <> 6/class , p=1/15= 0.016666... "
    E = 6
    DATA -100,-1.5010, -1.1108,-.8416, -.6229,-.4307,-.2533,-.0837
    DATA .0837, .2533,.4307, .6229,.8416,1.1108,1.5010,100
    FOR j = 1 TO 16
    READ L(j): PRINT USING "## ####.#### "; j; L(j); : NEXT j : PRINT : PRINT
    FOR tt = 1 TO n: h = x(tt)
    FOR y = 1 TO 15
    IF h < L(y + 1) AND h > L(y) THEN h(y) = h(y) + 1
    NEXT y: NEXT tt
    PRINT " contents/class "
    FOR u = 1 TO 15: PRINT USING "####"; h(u);
    chi = chi + (h(u) - E) ^ 2 / E: NEXT u
    COLOR 14: PRINT USING " ###.###"; chi
    COLOR 7
    FOR i = 1 TO all
    LOCATE 16, 50: PRINT USING "########"; all - i
    qqui = 0: RANDOMIZE TIMER
    FOR t = 1 TO n: g = INT(RND * 16) + 1
    inn(g) = inn(g) + 1
    NEXT t
    qqui = 0
    FOR f = 1 TO 16: qqui = qqui + (inn(f) - E) ^ 2 / E
    NEXT f
    COLOR 7
    IF qqui > chi THEN outt = outt + 1 / all
    FOR t = 1 TO 16: inn(t) = 0: NEXT t
    NEXT i
    LOCATE 17, 60: COLOR 12
    PRINT USING " p value= #.### "; outt
    END
     
    Luis A. Afonso, Dec 1, 2010
    #1
    1. Advertisements

  2. 10 classes, 9values/classe

    Let be k=10, test statistics is 4.444; Chi-Square critical value (95% confidence level) is 0.87984 = 0.880.
    By Monte Carlo (program V-10) I got (5 essays): 0.878, 0.873, 0.875, 0.875, 0.874;
    _______ mean = 0.875 +/- 0.001.
    This correct value of probability does validate the one obtained by the Chi- Square Distribution.

    Luis

    REM "V-10"
    CLS : PRINT : PRINT " ****** V-10 ****** ":
    PRINT " Butterfat (in pounds) produced by 90 COWS"
    DEFDBL A-Z
    INPUT " all = "; all
    n = 90
    DIM x(90), L(91), h(50), inn(50)
    DATA 486,537,513,583,453,510,570,500,458,555
    DATA 618,327,350,643,500,497,421,505,637,599
    DATA 392,574,492,635,460,696,593,422,499,524
    DATA 539,339,472,427,532,470,417,437,388,481
    DATA 537,489,418,434,466,464,544,475,608,444
    DATA 573,611,586,613,645,540,494,532,691,478
    DATA 513,583,457,612,628,516,452,501,453,643
    DATA 541,439,627,619,617,394,607,502,395,470
    DATA 531,526,496,561,491,380,345,274,672,509
    FOR i = 1 TO 90: READ x(i): x = x(i)
    sx = sx + x / 90
    sxx = sxx + x * x: NEXT i
    mean = sx
    ssd = sxx - n * mean * mean: var = ssd / (n - 1)
    sd = SQR(var)
    PRINT USING " ###.### "; mean; sd
    FOR i = 1 TO n: x(i) = (x(i) - mean) / sd: NEXT i
    PRINT " 10 classes <> 9/class , p=1/10= 0.01 "
    kla = 10: E = n / kla
    REM
    DATA -100, -1.2816, -0.8416, -0.5244,-0.2533, 0.0000
    DATA 0.2533,0.5244,0.8416,1.2816,100
    FOR j = 0 TO kla
    READ L(j)
    PRINT USING "## ####.#### "; j; L(j); : NEXT j
    PRINT : PRINT
    REM
    FOR tt = 1 TO n: h = x(tt)
    FOR y = 0 TO kla - 1
    IF h < L(y + 1) AND h > L(y)
    THEN h(y + 1) = h(y + 1) + 1
    NEXT y: NEXT tt
    PRINT " contents/class "
    FOR u = 1 TO kla: PRINT USING "####"; h(u);
    chi = chi + (h(u) - E) ^ 2 / E: NEXT u
    COLOR 14
    PRINT USING " ###.###"; chi: COLOR 7
    FOR i = 1 TO all
    LOCATE 16, 50
    PRINT USING "########"; all - i
    qqui = 0: RANDOMIZE TIMER
    FOR t = 1 TO n: g = INT(RND * kla) + 1
    inn(g) = inn(g) + 1
    NEXT t
    qqui = 0
    FOR f = 1 TO kla
    qqui = qqui + (inn(f) - E) ^ 2 / E: NEXT f
    COLOR 7
    IF qqui > chi THEN outt = outt + 1 / all
    FOR t = 1 TO kla: inn(t) = 0: NEXT t
    NEXT i
    LOCATE 17, 60: COLOR 12
    PRINT USING " p value= #.### "; outt
    END
     
    Luis A. Afonso, Dec 4, 2010
    #2
    1. Advertisements

  3. All Three

    It’s a routine to provide CRITICAL VALUES (right tail)


    _n=90/400 000 _______________5%_______1%___ __________________JB_______5.37______12.45__
    __________________ U_______3.65______ 6.68__
    __________________ V_______2.81______ 7.86__
    _____/1 000 000 ____JB_______5.36_____ 12.49__
    __________________ U_______3.65______ 6.68__
    __________________ V_______2.81______ 7.84__

    The Jarque-Bera Test significance alpha is unable to assure a similar alpha for U and V as well.

    DATA COWS provided JB=0.577, U=0.269, V=0.308 showing that if the test is very LOW there is not conflict with the U and V critical values: all three are not rejected.
    However, in general, the error remains: The Jarque –Bera Test is not reliable, it should be banned, because acceptance DOES NOT ASSURE us that Skewness and Kurtosis are acceptable at the same confidence level.

    Luis
     
    Luis A. Afonso, Dec 19, 2010
    #3
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.