Reef Fish Statistics for Dummies: Applied Simple Regression

Discussion in 'Scientific Statistics Math' started by Reef Fish, Oct 4, 2006.

  1. Reef Fish

    Reef Fish Guest

    That is certainly true of YOU, m00es.

    "The same" in (2) means "Y needs to be normal" as in (1).

    But Y is the COLUMN of numbers used for either the calculation
    of the correlation coefficient r*, OR used in a simple regression.

    The column of Y in a simple regression, in THEORY, comes
    from n different normal distributions, depending on the n X's.
    if the ERRORS in the model is N(0, sigma^2). So, in theory,
    it is already non-normal. But in practice, the OBSERVED
    data for Y (as an aggregate in the column of numbers)
    doesn't have to follow ANY distribution at all, and therefore
    there is NOTHING to VALIDATE about Y when one is
    doing a simple regression problem.

    There is absolute no NEED for the column of Y to be normal.

    In fact, most of the time in a simple regression problem, the
    Y is NON-NORMAL -- Y can easily be bimodel for the
    data coming from two very different normal distributions
    with very different means.

    You should REMAIN at this point until you understand this
    very first step. And remember, the normality of Y or
    nonnormality of it has to be VALIDSTED by DATA, not
    by any hypothesis to be tested in any regression model.

    -- Reef Fish Bob.
     
    Reef Fish, Oct 13, 2006
    #41
    1. Advertisements

  2. Reef Fish

    m00es Guest

    Apparently, you are having difficulties distinguishing that which is
    true in the population and that which we observe in a sample. In the
    population, we assume that the model describing the relationship
    between Y and X is given by Y = beta0 + beta1 X + e, where e ~ iid N(0,
    sigma^2). This is a statement about the population. Now if in reality
    beta1 = 0 (i.e., Y and X are completely unrelated), then Y = beta0 + e,
    where e ~ iid N(0, sigma^2). Therefore, in the population Y ~ N(beta0,
    sigma^2). Hence, if Y and X are unrelated, Y is normal.

    Now, if we take a random sample from that population, then we are
    either taking a random sample from a population where beta1 = 0 or
    where beta1 != 0. In other words, we are either taking a random sample
    from Y ~ N(beta0, sigma^2) or from Y ~ N(beta0 + beta1 X, sigma^2). We
    don't know which one of these two cases holds. But we take our random
    sample, fit the model, calculate b1 and s(b1) and then b1/s(b1). If H0:
    beta1 = holds (i.e., Y is normal), then b1/s(b1) will be a realization
    of a random variable following a central t-distribution with n - 2
    degrees of freedom. If H0 does not hold, then b1/s(b1) is a realization
    of a random variable following a non-central t-distribution.

    m00es
     
    m00es, Oct 13, 2006
    #42
    1. Advertisements

  3. Reef Fish

    Reef Fish Guest

    If you're going to follow up discussing STATISTICS, you follow
    up with what I posted, not rehashing your old ERRORS.

    The DATA is the SAMPLE from one or more populations.

    The DATA is what you use to VALIDATE the necessary
    assumption behind any statistical procedure.

    The DATA does not depend on what you model or what
    parameter value you want to test.

    You get THAT part into you numb skull and go back to your
    own step (2).

    If you want to test R, using T, the Y must be Normal.

    So you MUST VALIDATE the normality of Y.


    If you want to test the slope in a regression, there no NO
    distributional assumption on the aggregate Y (it came from
    n different distributions in theory).

    There is NOTHING to VALIDATE about Y.
    You wait until you have observed residuals,then you
    validate the distribution of the errors.

    If you FAIL the validation in BOTH, none of your
    inference based on the test statistic is valid.

    If you SUCCEED in the validation of BOTH, then
    both are valid, as I had explained to Ewart Shaw
    in his given conditions and insistence.

    It's in the ACTUAL problem of testing the equality
    of means of two populations that

    T(n-2) may be validly used for testing R while
    not valid for testing beta1, if Y is validated to be
    normal, but the residuals of the regression
    seriously violated the normal validation,

    OR

    T(n-2) may be INVALID for testing R because
    neither X nor Y is normal; but perfectly valid
    for testing beta1 because the RESIDUALS
    are validated to normal.

    At this point, I must ask OTHERS to explain the
    above to m00es, or if anyone ELSE still doesn't
    follow, ask questions about these two:

    ===============================
    T(n-2) may be validly used for testing R while
    not valid for testing beta1, if Y is validated to be
    normal, but the residuals of the regression
    seriously violated the normal validation,

    OR

    T(n-2) may be INVALID for testing R because
    neither X nor Y is normal; but perfectly valid
    for testing beta1 because the RESIDUALS
    are validated to normal.
    ===============================

    because there is no other way on earth or in hell
    that FACT can be explained in a more direct way
    that is consistent with standard practice of
    VALIDSTION of statistical ASSUMPTIONS in
    procedures before conducting any inferece or test.

    -- Reef Fish Bob.

    -- Reef Fish Bob.
     
    Reef Fish, Oct 13, 2006
    #43
  4. Reef Fish

    m00es Guest

    You can just keep on ignoring what I wrote, but it's still correct. I
    am not talking about verifying any assumptions. I am not even talking
    about the correlation coefficient. I am just trying to teach something
    to you that is apparently very difficult to understand. And that is:
    the distribution of b1/s(b1) is only t-distributed with n-2 degrees of
    freedom when H0: beta1 = 0 holds, which implies that Y is normal. Once
    you understand this point, then we can proceed and talk about ways to
    verify assumptions. But for now, that is reaching too far when you
    can't even understand something so simple.

    Here is the proof that b1/s(b1) ONLY follows a t-distribution with n -
    2 degrees of freedom when Y is normal. The model: Y = beta0 + beta1 X +
    e, where e ~ iid N(0, sigma^2). Let SSX = sum(x - xbar)^2.

    b1 ~ N(beta1, sigma^2 / SSX )
    MSE ~ chi^2(n-2) sigma^2 / (n - 2)
    s^2(b1) = MSE / SSX

    Therefore:

    b1/s(b1) ~ N(beta1, sigma^2 / SSX) / sqrt{ chi^2(n-2) sigma^2 / [(n -
    2) SSX] }
    ~ N(beta1, 1) / sqrt( chi^2(n-2) / (n - 2) ).

    When beta1 = 0, then:

    b1/s(b1) ~ N(0,1) / sqrt( chi^2(n-2) / (n - 2) )

    and that is distributed t with n - 2 degrees of freedom. But ONLY when
    beta1 = 0. Otherwise, it will be non-central t.

    And when beta1 = 0, then in the population Y = beta0 + e, hence Y ~
    N(beta0, sigma^2).

    So, ONLY when Y is normal (i.e., beta1 = 0) will b1/s(b1) be
    t-distributed with n - 2 degrees of freedom.

    We have not yet drawn any data. These are just facts that hold when we
    are correct in assuming that in the population the relationship between
    Y and X is described by a linear model of the form Y = beta0 + beta1 X
    + e, where e ~ iid(0, sigma^2).

    So, why don't you skip all the rhetoric and insults and accept that
    this is a fact. Or if you disagree with anything I wrote, then please
    point out where the error is (we would have to rewrite 1000's of stat
    books if you can find an error in what I wrote). But don't get started
    again about assumptions and data. We aren't even there yet.

    m00es
     
    m00es, Oct 13, 2006
    #44
  5. Reef Fish

    Reef Fish Guest

    because you repeated the same errors you made in Day 1. with
    absolutely nothing new.

    I have already put it as succinctly as possible, and ask OTHER
    readers to explain to you, or for THEM to ask me what they
    don't understand about these two parts about the two
    hypotheses being incompetible if the DATA can validate on
    procedure and not the other.

    If you want to have anything to say, address these two, so
    OTHERS can respond to you.

    ===============================
    T(n-2) may be validly used for testing R while
    not valid for testing beta1, if Y is validated to be
    normal, but the residuals of the regression
    seriously violated the normal validation,

    OR

    T(n-2) may be INVALID for testing R because
    neither X nor Y is normal; but perfectly valid
    for testing beta1 because the RESIDUALS
    are validated to normal.
    ===============================

    because there is no other way on earth or in hell
    that FACT can be explained in a more direct way
    that is consistent with standard practice of
    VALIDSTION of statistical ASSUMPTIONS in
    any statistical procedures before conducting any
    inferece or test.

    The two procedures were:

    Testing correlation given DATA (X, Y) using a T
    distribution with (n-2) d.f.

    Assumption to be validated: X or Y MUST be Normal.

    Testing the slope of a simple regression line given
    DATA (X, Y) using a T distribution with (n-2) d.f.

    Assumption to be validated: The usual i.i.d. (0, sigma^2)
    assumption about the ERRORS in the linear fit.

    -- Reef Fish Bob.
     
    Reef Fish, Oct 13, 2006
    #45
  6. Reef Fish

    m00es Guest

    I am not making any errors. We haven't even gotten yet to the issue of
    how to verify assumptions, and what the role of the correlation
    coefficient is in all of this. I am trying to do this step by step,
    because there is no point in discussing anything if there if you do not
    understand a very basic fact about the distribution of Y.

    All I am trying to explain to you is that Y is normal under H0: beta1 =
    0. Once you get this point, we can discuss other things. But
    apparently, you do not want to admit that I am correct.

    Look, it's very very simple. When Y = beta0 + beta1 X + e, where e ~
    iid N(0, sigma^2), then Y ~ N(beta0 + beta1 X, sigma^2). Therefore, the
    conditional distribution of Y|x is N(beta0 + beta1 x, sigma^2). And the
    marginal distribution of Y is indeed a mixture distribution. However,
    when H0: beta1 = 0 holds, then Y ~ N(beta0, sigma^2) and the marginal
    distribution of Y is normal.

    Why is it so difficult for you to admit that this is true?

    m00es
     
    m00es, Oct 14, 2006
    #46
  7. Reef Fish

    Reef Fish Guest

    You not only made errors, but you were polluting the OTHER thread
    in which I asked OTHER readers to either ask THEIR questions or
    explain your errors.

    You errors are so OBVIOUS.

    In a sense, Dick Startz explain to Kevin how you could have a valid
    set of data for testing correlations while an INVALID one for testing
    the slope of a simple regression.

    If you just keep your BIG MOUTH SHUT and use your little ears to
    LISTEN for awhile, you may learn something from the various readers
    in these groups.

    -- Reef Fish Bob.
     
    Reef Fish, Oct 14, 2006
    #47
  8. Reef Fish

    m00es Guest

    Why do you keep evading what I wrote? Under the model Y = beta0 + beta1
    X + e, where e ~ iid N(0, sigma^2), beta1 = 0 implies that Y is normal.
    Why is it so difficult for you to admit that this is true?

    m00es
     
    m00es, Oct 15, 2006
    #48
  9. Reef Fish

    Reef Fish Guest

    Because the DATA could have come from beta1 = 100,000.

    You are only TESTING if beta1 = 0.

    That's how DENSE you are.

    Read it in the thread for OTHER people and the post by Dick Startz
    that tells exactly the same reason why you've been wrong all these
    weeks, without the slightest sign of revival or recovery.

    The HOLE you dug is so deep that even if you recover, you'll be
    the laughting stock of this group for years to come, in the archives
    of sci.stat.math.

    -- Reef Fish Bob.
     
    Reef Fish, Oct 15, 2006
    #49
  10. Reef Fish

    m00es Guest

    If the data came from model where beta1 != 0, then the marginal
    distribution of Y is not normal. That is correct. I never said
    otherwise. However, IF beta1 = 0 holds, then Y is normal. So, you are
    still unable to admit that Y is normal when beta1 = 0 holds.

    Yes, and to test beta1 = 0, we initially assume that H0: beta1 = 0
    holds. If H0 holds, then Y is normal. And if H0 holds, then b1/s(b1)
    follow a t-distribution with n - 2 degrees of freedom. If H0 does not
    hold, then Y is nor normal. And then b1/s(b1) follows a non-central
    t-distribution with n - 2 degrees of freedom.

    But one more time. IF H0: beta1 = 0 holds, then Y is normal. So, again,
    why are you having such a hard time admitting that this is true?

    More insults. Zero substance.

    m00es
     
    m00es, Oct 16, 2006
    #50
  11. Reef Fish

    Reef Fish Guest

    That says it all about all you've been saying for weeks, NOT
    recognizing
    that the DATA one tests has nothing to do with the Ho being tested.

    It has to validate the ASSUMPTION of the regression procedure.

    That's the extent of your perpetual ignorance.

    -- Reef Fish Bob.
     
    Reef Fish, Oct 16, 2006
    #51
  12. Reef Fish

    Russell Guest

    So you're saying that it is an impossibility for there to
    exist data which are not normally distributed for which
    beta1 = 0? I don't believe that. In fact, given your model
    Y = beta0 + beta1 X + e, if we take e to be distributed as,
    say, Cauchy then Y isn't normal even if beta1 = 0. Now
    I don't know how sensitive this analysis is to a violation
    of e being normal. As I understand things, some tests are
    more robust than others to violations of their assumptions,
    but when I was learning about data analysis I was taught
    to check for such violations. I've adopted as my creed a
    slight modification of a statement in the book on spectral
    analysis by Blackman and Tukey, _The Measurement of
    Power Spectra From the Point of View of Communication
    Engineering_: All too often the study of data requires care.
    Cheers,
    Russell
     
    Russell, Oct 16, 2006
    #52
  13. Reef Fish

    m00es Guest

    That's correct. However, Reef Fish just doesn't want to admit that the
    model Y = beta0 + beta1 X + e with e ~ iid N(0, sigma^2) implies that Y
    is normal when beta1 = 0. Certainly e could follow any other
    distibution, but then b1/s(b1) does not follow a t-distribution even
    when beta1 = 0. Neither will r * sqrt(n-2)/sqrt(1-r^2). However, even
    then will b1/s(b1) and r * sqrt(n-2)/sqrt(1-r^2) follow the exact same
    distribution (whatever it may be). Since b1/s(b1) = r *
    sqrt(n-2)/sqrt(1-r^2), they ALWAYS follow the same distribution.

    Of course, checks on the assumption that e is normal should be carried
    out. But that has nothing to do with the equivalence of the test of
    beta1 = 0 and rho = 0.

    m00es
     
    m00es, Oct 17, 2006
    #53
  14. Reef Fish

    m00es Guest

    And again, you avoid the issue. Insults, on the other hand, seem to be
    your forte.

    If Y = beta0 + beta1 X + e, where e ~ iid N(0, sigma^2), then the
    marginal distribution of Y follows a mixture distribution (a mixture of
    normals). However, if beta1 = 0, then the marginal distribution of Y is
    normal. Why are you having such a hard time pointing out where the
    error is? Oh right, there IS no error! Yes, that would make it
    difficult to point out where the error is. I suppose throwing insults
    around is an alternative then.

    m00es
     
    m00es, Oct 17, 2006
    #54
  15. Reef Fish

    Reef Fish Guest

    That's the END of m00es's arguments all these weeks.

    Why more DOUBLE TALK?
    NONE of that had anything to do with what I said about DATA that can
    be validly used to test ONE (rho) and NOT THE OTHER (beta1).
    NOT when you don't have any e and have ONLY X and Y to test the
    correlation rho.

    Precisely. The equivalence of beta1 = 0 and rho = 0 is IRRELEVANT
    when you are testing ONLY correlation, and there you have to test
    that either X or Y is normal, and nothing about e, because e
    DOESN'T exist.

    In the end, you have proven yourself WRONG once more, trying to
    mouth dance and weasel your way out notwithstanding.

    -- Reef Fish Bob.
     
    Reef Fish, Oct 17, 2006
    #55
  16. Reef Fish

    m00es Guest

    What double-talk??? Wait, don't answer that. It's just going to be a
    bunch of nonsense anyway.

    The tests are the same. When are you going to get that into your head?

    Once again, the test of rho = 0 is equivalent to the test of beta1 = 0.
    It's really not that difficult to understand.

    Funny; you are the one who keeps dodging my question. And I have not
    proven myself wrong. On the other hand, you have proven that you are
    unable to carry on an argument, you are unable to admit that you are
    wrong, and you lack common decency, since you revert to insults
    whenever you can.

    m00es
     
    m00es, Oct 17, 2006
    #56
  17. Reef Fish

    Reef Fish Guest

    But the tests require DIFFERENT assumptions to be validated.
    That was what never got throught your thick head because your mind
    is so mutilated by mathematical statistics in a vacuum of APPLIED
    statistics, just couldn't understand that the validation of ASSUMPTIONS
    in a procedure has nothing to do with the Ho being tested.

    It has EVERYTHING to do with the DATA used.
    It is trivial to understand. But it is hard for m00es to understand
    that
    in the FORMER, one needs to validate the normality of Y and X.
    In the LATTER, X and Y can be anything. NOTHING to validate in X an
    Y.

    Any freshman with any sound instruction in simple regression would
    have understood the difference.

    Only a BLIND mathematical statistics student like m00es remains BLIND
    to the simple fact, and POLLUTE the entire three newsgroups in his
    incessant repeat of the same FALSEHOOD that had nothing to do with
    the issue of VALIDATION of assumptions.
    Proven in your current post, and shot your own foot with the same
    weapon.

    -- Reef Fish Bob.
     
    Reef Fish, Oct 17, 2006
    #57
  18. Before the main test to validate the assumption (valid) if the sample has the normal distribution (for example) and then perform it SEEMS a good idea (for ignorant people) but IMO is STATISTICALLY A DISASTER.
    Why?
    Because one jump from a well established situation to a CONDITIONAL TEST
    __evaluation of p(H0|valid)___that we cannot solve.

    ______licas (Luis A. Afonso)
     
    \Luis A. Afonso\, Oct 17, 2006
    #58
  19. Reef Fish

    Reef Fish Guest

    The keyword in the above is "IMO".

    When "IMO" comes from someone ignorant about statistics and the well-
    accepted and understood practice of the VALIDATION of assumptions in
    Data Analysis and Applied Statistics, it remains the OPINION of the
    uneducated.

    Read George Box, "Science and Staistics", JASA 1976, the article which
    I have referenced at least a dozen times to others (which was probably
    not seen by Luis A. Afonso), and let George Box teach you a lesson or
    three in what statistics is about.

    THEN, you may be ready to read dozens and dozens of books and
    articles in the statistical literature of WHY it makes sense to
    VALIDATE
    the assumptions behind a procedure before plunging into applying it.

    SIMPLE LOGIC: If the assumptions are seriously violated, then all
    the statistical results and conclusion based on the procedure are
    INVALID, WRONG, and not worth the paper it's printed on (by your
    computer program).

    -- Reef Fish Bob.
     
    Reef Fish, Oct 17, 2006
    #59
  20. What I go from Reef Fish commentary


    ___Reef Fish relying only on an *authority* (?) (and the orinary ad hominen insult).

    What I didn´t see

    ___His opinion why IT IS NOT AN CONDITIONAL TEST.


    _______licas (Luis A. Afonso)
     
    \Luis A. Afonso\, Oct 17, 2006
    #60
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.