Correlation: In Memory of My Former Academic Colleagues

Discussion in 'Scientific Statistics Math' started by Reef Fish, Jun 15, 2005.

  1. Reef Fish

    Reef Fish Guest

    You wouldn't know what's a "Troll" if it kicked you in the shin.

    I made my poinsts in my two posts. If you can't address those points
    then you should be quiet.

    I am telling it like it is, in EVERY topic in which I participate
    in this (and other) newsgroups.

    I made MY POINT. The Central Limit Theorem does NOT have to apply
    to physical measurements. It's the IDEA of something being the
    result of HUNDREDS of factors that are too simplistic to attribute
    to one or even several, on CORRELATIONAL grounds (without control)
    which is unacceptable in the first place.

    You have NO rebuttal other than your non-specific and ad hominem
    remarks.

    -- Bob.
     
    Reef Fish, Jun 28, 2005
    #61
    1. Advertisements

  2. Reef Fish

    Eric Bohlman Guest

    That's simply an artifact of a decision, back in the 1940s when both
    "computer" and "calculator" were job titles for persons, to standardize the
    scores to a mean of 500 (and an SD of 100) in order to make quick mental
    calculations easier. It had absolutely *nothing* to do with concerns about
    "hurting kids' feelings." That there are no scores below 200 or above 800
    simply reflects the fact that, regardless of how much people want them to
    be, norm-referenced standardized tests are simply *not* any good at making
    fine distinctions between people who score toward the extremes of the
    scale; any variation in the 0-199 or 801-1000 ranges would be nothing but
    noise. Reporting scores in those ranges would be equivalent to reporting
    measurements to more significant figures than the instrument was capable
    of, something that we've all been taught is a big no-no.
     
    Eric Bohlman, Jun 28, 2005
    #62
    1. Advertisements

  3. Reef Fish

    Guest Guest

    If someone measured something as tangible as my height or weight,
    then adjusted it because "heights & weights are normally distributed"
    and they therefore grade on a curve, then I would say (in my blunt
    Yorkshire way) that they are talking bollocks.

    If someone measured something as intangible as my IQ,
    then adjusted it "because IQ is normally distributed" etc.,
    then why should my response be any different?

    This is a serious question.
     
    Guest, Jun 28, 2005
    #63
  4. Reef Fish

    Bob Wheeler Guest

     
    Bob Wheeler, Jun 28, 2005
    #64
  5. Reef Fish

    Reef Fish Guest

    and WHY are they normally distributed? Or so are numerous OTHER
    measurements that are naturally normally distributed?

    grading on a curve on a SMALL sample is certainly bollocks. That's
    one of the reasons I NEVER grade on a curve, in ANY of my classes,
    and I criticize others for so doing.

    But if my tests are given to a LARGE population, then the test
    score TOTALS will be naturally normally distributed because of the
    Central Limit Theorem. In practice, for nearly the same tests on
    some classes I've taught for 20+ years, the aggregate of all the
    inidvidual scores are approximately normally distributed even
    though they are not in the individual classes.
    This is a serious answer. Read first my reply to your bollock
    question. In this case, for the SAME reason as tests, if the
    ENTIRE population (or nearly so) are taking the IQ tests, they
    scores would NATURALLY be normally distributed. because of the
    SAME Central Limit Theorem effect on the SUM of the scores of
    the test items.

    I had already explained why they are normed and standardized.

    -- Bob.
     
    Reef Fish, Jun 28, 2005
    #65
  6. Reef Fish

    Reef Fish Guest

    Bob Wheeler wrote:

    RF wrote,

    I meant to say "does NOT have to apply ONLY to physical measurements".

    It's the IDEA of something being the
    That's what the intellectual OSTRICHES always say.

    -- Bob.
     
    Reef Fish, Jun 28, 2005
    #66
  7. Reef Fish

    Jerry Dallal Guest

    I think Bob is advocating for what Rao calls Hagen's hypothesis for
    normality. That the total test scores are approximately the sum of
    independent infinitesimals, so the totals (that is, grades) should
    follow a normal distribution. The reason the normality is not
    necessarily seen in any individual class is the same reason that a
    sample of 50 from a normal distribution often looks nothing like the parent.
     
    Jerry Dallal, Jun 28, 2005
    #67
  8. Reef Fish

    Herman Rubin Guest

    Approximately means little, if anything, in the tails.

    The Central Limit Theorem is nowhere near that precise.
    Also, it requires that the various components are additive;
    what happens if some are additive and some multiplicative?

    In this case, it is the tails which are important. The
    skewness term to the CLT is maximized at about the 85-th
    percentile, and the kurtosis term at about the 95-th.
    I suggest you check out the error in the CLT. Take a look
    at the tails in comparing binomials with the normal, and
    in the case of the binomial, the various factors have equal
    effect. If this is not the case, the error is greater.
    Would any physicist construct a scale in that way?
    When I went to school, we all took IQ tests.
    The SAT involves knowledge, and possibly now more knowledge
    than ability. This is why MENSA no loger allows it.
    How does one standardize a thermometer? One compares it with
    another thermometer. Now there are more precise methods, but
    until recently, all physical units were standardized against
    "standard" instruments, not against frequency distributions on
    varying sets.
    Again, using a sample frequency distribution on a relatively
    small sample to obtain a scale is statistical idiocy.
    Educational psychologists, and other users of statistics, need
    to understand the concepts. Instead, they "run off at the mouth"
    using methods as religious mantras.
     
    Herman Rubin, Jun 28, 2005
    #68
  9. Reef Fish

    Herman Rubin Guest

    Are heights and weights normally distributed? And even if
    they are normally distributed in each of a finite number
    of groups, this means that they are NOT normally distributed
    overall unless the means and variances are equal.
    The same educationists who use normality are the ones who
    started the idea of grading on a curve.
    There is absolutely no reason for this. The CLT might make
    the overall average approximately normal, but it states nothing
    about an overall distribution.
     
    Herman Rubin, Jun 28, 2005
    #69
  10. Reef Fish

    Reef Fish Guest

    You're speaking as a "mathematical statistician" of course.
    In that respect, in my Data Analysis course, before discussing
    analytic and graphical methods of validating "normality", I
    show a one-line proof of the Theorem: NOTHING in the Real
    World is normally distributed. Proof: All measurements are
    finite. QED. Corollary note: Many measurement cannot have
    negative values, e.g. IQ.

    In practice, it is the degree to which the normal distribution
    may adequately serve as an APPROXIMATION.

    What do you mean "educationists who use normality"?

    I use normality in data analysis (whenever appropriate) OFTEN. I
    NEVER grade on a curve, as mentioned above.

    For those who teach classes of size in excess of 500 say, and use
    exams that consists of MANY items, there are SOME justifications
    for grading on a curve (though I don't condone it).

    In my opinion, an instructor SHOULD know the ABSOLUTE standards
    in a course which s/he teachers. If s/he doesn't, s/he shouldn't
    be teaching it. Therefore, an ABSOLUTE scale should be set up
    on any Exam of sufficient length, so that if EVERY students makes
    the cut for an "A", they ALL get "A"s. In practice, my own
    empirical distribution for the First Course (required to be taken
    by ALL students: including English, Music, and such majors) it
    wasn't uncommon for some classes to have 5% A's and 40% F's.

    While your assertion is correct about the overall average, you
    overlooked that fact that the INDIVIDUAL exam score IS the AVERAGE
    (total) of scores of the ITEMS (questions) in an exam.

    -- Bob.
     
    Reef Fish, Jun 28, 2005
    #70
  11. Reef Fish

    Radford Neal Guest

    I've never heard of "Hagen's hypothesis", but if it's as stated above,
    it is clearly nonsense. The fact that the score on a test is the sum
    of scores for many items is possibly (under rather strong assumptions)
    a reason to think that the conditional distribution for the test score
    GIVEN the identity of the test subject is normal, but it provides no
    reason at all to think that the distribution of test scores over a
    population of subjects is normal.

    In general, the marginal density of X can be written as

    p(x) = INTEGRAL p(x|y) p(y) dy

    Showing that p(x|y) is normal for any fixed y does NOT show that p(x)
    is normal. It will be normal if and only if p(y) is normal.

    This is not controversial. It's elementary probability theory. And
    that's all I'll have to say on the subject. If anyone here wants to
    continue denying the obvious, other interested readers should just
    read this post again, to see what my reply would be. I won't be
    posting any further on this thread.
     
    Radford Neal, Jun 28, 2005
    #71
  12. Reef Fish

    Reef Fish Guest

    Only a few post ago, you were TOTALLY confused

    RN> The items are not independent, since, obviously, they all depend on

    RN> the person's IQ.

    which prompted me to comment,

    RF> Here you don't even know what "independent" applies to. It's the
    RF> indepedence of the test SCORES for many items (for the Multiple
    RF> Choice questions on IQ tests).

    and not the dependence/independence of the score on IQ!

    Jerry Dallal is very capable of speaking for himself, but I wish he
    would pospone it until he has finished his part (or whatever he'll
    do) on the SPSS Manual data fitting, before continuing this ever-
    lasting argument on the normality issue.

    Given your previous follow-up to my post, which I had already
    replied, I don't see any reason for me to further comment on this
    subject other than pointing out YOUR confusion, which did not
    help your credibility in your subsequent arguments.

    -- Bob.
     
    Reef Fish, Jun 28, 2005
    #72
  13. Reef Fish

    Herman Rubin Guest

    As the test scores are not approximately the sum of
    "independent infinitesimals", the normal conclusion is
    false. Looking at the conclusions from the genome,
    probably fewer that 100 genes are the dominant ones
    involved, and these are hardly enough. Also, even if
    within one population they MAY be independent, genetic
    drift will make them dependent if populations are mixed.
     
    Herman Rubin, Jun 28, 2005
    #73
  14. Reef Fish

    Radford Neal Guest

    It's always hard to follow resolutions about not posting anymore...

    I'll just say that I knew perfectly well what "Reef Fish" was
    referring to when he talked about independence. I think this is clear
    from my post, the purpose of which was to point out that his
    assumption of independence conditional on the identity of the person
    taking the test is not at all sufficient to justify the conclusion
    that he drew about the distribution of test scores in the population.
     
    Radford Neal, Jun 28, 2005
    #74
  15. Reef Fish

    Herman Rubin Guest

    For many purposes, it matters little. For others, it does.
    When policy is based on forcing things to be normal, it is
    a religious violation of the understanding of statistics.

    However, they DO say this.
    Exactly that. Education courses do advocate grading on a
    curve, with normality as the basis.
    I do not believe that a good examination in mathematics or
    statistics should even have a dozen questions, although some
    of them may have many parts. So there are still no such
    justifications.
    This is almost completely ignored at this time. It is likely
    to get the instructor in trouble with the administration. Are
    there any schools which make any attempt to maintain the old
    standards, or anything like that?
    A small number of items (it would take hundreds to consider
    the CLT as a reasonable approximation) and they are not
    independent, at least if it is a good exam.
     
    Herman Rubin, Jun 28, 2005
    #75
  16. Reef Fish

    E. Wijsman Guest

    Just as non-statisticians should not claim knowledge about statistics that
    they do not have, statisticians should not claim knowledge about genetics
    that they do not have. The statement above is nonsense. I don't know
    where it comes from, but it does not come from an informed read of
    reputable scientific literature.

    Ellen M. Wijsman
    Div. of Medical Genetics and Dept. Biostatistics
    University of Washington
    Seattle, WA 98195-7720
     
    E. Wijsman, Jun 28, 2005
    #76
  17. Reef Fish

    Reef Fish Guest

    I believe at least a good part of our disagreement about the CLT
    effect depended on your preference/assumption/prejudice about
    no more than "a dozen questions".

    All the advanced Math and Stat GREs have DOZENS of questions,
    probably closer to 100 than 12.

    For ALL of my exams, I work very hard to try to minimize both
    the Type I and Type II errors in the total scores in determining
    a letter grade. That's why I allow PLENTY of room for student
    NOT to know the answers to quite a few questions, and yet it is
    very hard for them to get LUCKY. 40% was passing. 75% was an A.

    For a typical Final Exam (for elementary courses where most of
    the "earned F" came, I may have 50 T/F questions, 25 Multiple
    Choice, and 10 Problems (each may have many parts). So, in total,
    there were over 100 questions.

    For small number of question, I wouldn't expect the result to
    follow any distribution.
    No SCHOOL except the top few. That's the sad state of affairs
    in our educational system today. It's up to a few outlier
    Professors such as myself, who not only held the line on PRINCIPLE,
    but also could AFFORD it because I had been a tenured full prof
    since 1977, seven years from my Ph.D. degree, and the blatent
    grade inflation didn't start it's downhill acceleration until
    the mid to late 1980s.

    Of course it would get an intructor into trouble! At first, the
    administrators looked the other way when 50% of MY students
    dropped before the end or got "F"s. Then they yielded to
    pressure from the students and tried to blame it on ME rather
    than the students who didn't perform, because some of the
    students I flunked would have gotten As or Bs from other
    instructors who "sold their souls to the Devil".

    So, I walked out, having told several of the adminstrators
    (Chair, Deans, Provosts) where they should go. :) It was so
    "cold Tukey" <G> that I didn't even know Tukey had passed away
    till three YEARS later, in 2003.

    -- Bob.
     
    Reef Fish, Jun 28, 2005
    #77
  18. Reef Fish

    DZ Guest

    Don't know about the number of genes with large effects (if that's
    what meant by "dominant") but there's nothing wrong with the drift
    with admixture part. Surely if parts of large distinct equilibrium
    populations are mixed into a small one, that will create correlations
    between alleles at different locations for many generations (i.e. due
    to admixture with subsequent drift).
     
    DZ, Jun 28, 2005
    #78
  19. Reef Fish

    E. Wijsman Guest

    There is nothing wrong *in principle* with the drift and admixture
    statement, but both statements show no evidence of an understanding of the
    actual data that exist to support or refute the importance of the effects
    cited on normality of IQ distributions.

    In practice, drift is of tiny effect compared to most other sources of
    error. Relatively small sample sizes are a bigger source of variation
    among human samples than are effects of drift, which only has substantive
    effects in very small and isolated populations. Admixture can create
    associations, but most real-life data show little or no evidence for
    associations among most ordinary markers/genes in most populations that
    have been carefully examined.

    As an explanation for putative non-normality of IQ distributions these
    statements are nonsense since they are simply not grounded in
    understanding of the data that exists in human populations. The statement
    of < 100 major genes is particularly ludicrous since we have absolutely no
    idea how many genes contribute to variation in IQ.

    Ellen Wijsman
    Div. of Medical Genetics and Dept. of Biostatistics
    University of Washington, Seattle
     
    E. Wijsman, Jun 29, 2005
    #79
  20. Reef Fish

    Jerry Dallal Guest

    While the pedant might view my restatement as not rigorous (no pedants
    here), I'm at a loss to see how it is nonsense. While I may be too far
    removed from the classroom to state the Lindeberg-Levy conditions off
    the top of my head, if the CLT doesn't say that the distribution of the
    sum of of iidrv's with finite variance doesn't converges to the normal
    distribution, then what does it say?

    One of the things I always liked about Rao's Linear Statistical
    Inference was his section 3ai giving a number of situations under which
    data might end up normally distributed. There's
    (i) Hitting the Bull's Eye (Herschel's Hypothesis)
    (ii) Maxwell's Hypothesis
    (iii) Limit of the Binomial (De Moivre's Theorem)
    (iv) Hagen's Hypothesis (Theory of Errors)
    (v) Sum of n Independent Normal Variables

    The rest of this post quotes directly from the first edition of Rao's
    Linear Statistical Inference (p 130):

    Hagen based his proof of the normal law of error under the following
    assumptions:

    (a) An error is the sum of a large number of infinitesimal errors, all
    of equal magnitude, due to different causes.
    (b) The different components of errors are independent.
    (c) Each component of error has an equal chance of being positive or
    negative.
     
    Jerry Dallal, Jun 29, 2005
    #80
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.