multiple testing (continued)

Discussion in 'Scientific Statistics Math' started by Nicolas Bonneel, Nov 16, 2010.

  1. Hi,
    After a somehow recent discussion here, I became convinced
    that when performing multiple testings, p-values needed to
    be corrected to account for multiple tests (eg., Bonferroni).

    I now wonder the following : if we group all the literature
    of all possible topics containing experimental data and
    p-values, it should also bring back the issue of multiple
    testings. Even if each one of the papers have correctly
    accounted for multiple testing, it is still very likely that
    among all papers, a very small (corrected) p-value (or
    several) has been achieved just by chance and thus the null
    hypothesis rejected although the effect is not genuine.
    Given the number of papers available (on Google Scholar, I
    count 1.67 millions containing the word "p-value", 6.29
    millions with the word "experiment"...), I believe this to
    be highly probable.

    Does it mean that when doing a Bonferroni correction, all
    papers about all topics (including future ones!) have to be
    taken into account in the correction ? ;)

    Cheers
     
    Nicolas Bonneel, Nov 16, 2010
    #1
    1. Advertisements

  2. Nicolas Bonneel

    Rich Ulrich Guest

    The technical discussions of testing usually include the
    prescriptions of being "experiment-wise" or "family-wise"
    or some such. So... the individual experimenter is in
    the clear. Also, Bonferroni does not have much role in
    meta-analyses. Meta-analyses will use various methods
    to combine results that are appropriate when the *same*
    hypothesis is tested more than once.

    The consideration of how much faith to put into published
    results -- a problem for meta-analysis -- is often called
    the "desk-drawer problem."

    Historically, there is a problem that if you don't get "results"
    (significant p-value), there has been a presumption that
    you probably (or potentially) did a lousy job of designing
    and measuring; and therefore, you don't deserve to get
    published; and editors reject your paper.

    That is, what you say about *published* papers has
    missed the core of a more serious problem. If twenty
    studies are done, and 19 don't have "a significant p-
    value"... How many of those don't *try* to get published,
    and how many of them fail to get published because
    of their "lack of results"?
    - I don't know the progess of it, but there has been a
    movement to have all studies (starting with the
    ones with government funding) to register when they
    begin, with the intention that their results should be
    incorporated in a proper framework for decisions.
     
    Rich Ulrich, Nov 16, 2010
    #2
    1. Advertisements

  3. Nicolas Bonneel

    Paul Guest

    That is indeed a serious problem, I think particularly in the
    social sciences. In the (doctoral) methods seminar I teach, we
    discuss published papers, mainly from business journals but
    some from psychology, labor relations, social work etc. Students
    select the papers from their respective disciplines. To the best
    of my recollection, we've *never* seen a confirmatory study,
    which suggests that once a result is "proved" (as a mathematician
    I shudder when empirical researchers use that word) it loses
    all publication value to future researchers. We've seen a very
    small number of papers contradicting earlier findings, but in
    all cases they changed the model (adding predictors, switching
    single level to multilevel analysis, ...). Besides not seeing
    confirmatory studies, I don't think we've ever seen one where
    the same model was run on a different sample and failed to
    confirm the original paper's findings.

    I wish the entrepreneur who's about to unveil the n-th social
    networking site for researchers would instead create a site
    where confirmations/refutations of published empirical work
    could be logged (including working papers if not published).

    /Paul
     
    Paul, Nov 16, 2010
    #3
  4. I guess I'm getting more and more skeptical about p-values.
    I actually just finished reading the paper:
    "The insignificance of Statistical Significance Testing"
    http://www.stats.org.uk/statistical-inference/Johnson1999.pdf
    They are pretty harsh with p-values, but this seems quite
    well justified, and I find the paper excellent... :s

    In particular, a claim is that p-values are used because
    it's very easy to get statistically significant results this
    way. For example, for a given arbitrarily small effect,
    increasing the number of samples directly decreases the
    p-value. I guess I'll have to forget about the statistics
    courses I had!

    Cheers
     
    Nicolas Bonneel, Nov 17, 2010
    #4
  5. Nicolas Bonneel

    Rich Ulrich Guest

    I don't think it covers the ground as well as the paper by
    psychologists included in his references. See below.
    That's the formula. It applies pretty well to controlled
    studies -- in which Ns aren't always so easily increased.

    The comment is usually made in the context of warning
    against trusting "p-values" to tell you something important.
    However, the most common large studies are "observational"
    studies. There are various reasons, which various people
    will describe in various ways, that the tests that behave
    that way are not appropriate to those studies.

    The authors of the paper you cite do not seem to be
    wise to those arguments. They do abuse the effect-
    size argument.
    In September, we saw a post about "The Cult of
    Statistical Significance" which was a re-discovery of
    the problems of significance testing by economists.
    One user (cross-) posted,
    And I replied,
    ==== from Sept 23, 2010.
    Okay, that is a new book by (I guess) economists.

    It seems to cover the same territory that psychologists
    covered in the 1990s

    The main conclusion of that debate (IMHO) is that you
    need to describe and use "effect sizes" in addition to tests.

    [snip, LL Harlow reference]
    ==== end of post.

    And from a post I made in July, 2003.
    ===== July 5, 2003
    (me)
    It has been included as the first essay in the book, "What if
    there were no significance tests?" edited by Harlow, et al.

    I especially appreciated that book for including the
    contribution of Robert P. Abelson subtitled, "If there
    were no significance tests, They would be invented."
    I also recommend his book, "Statistics as principled argument."
    =======end of post.
     
    Rich Ulrich, Nov 17, 2010
    #5
  6. Nicolas Bonneel

    Herman Rubin Guest

    Decision theory shows that the larger the sample size,
    the smaller the cutoff point for the p-value should be.

    The real problem is when we know that the point null
    is false. If the standard deviation of the usual
    estimator is large compared with the width of the
    "null", the integral of the Type 1 loss, as a function
    of the parameter, can be placed as a point value at
    the null safely, but otherwise the problem gets more
    complicated. I consider this in my 1970 paper.

    A decision-theoretic approach to the problem of
    testing a null hypothesis.

    Unfortunately, if the width of the acceptance region
    is comparable with the standard deviation of the
    usual point estimate, It seems that there is no rule
    which does not depend heavily on the form of the Type 1
    loss times prior function in the region of acceptance.

    Are there other papers on this?
     
    Herman Rubin, Nov 17, 2010
    #6
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.