About consistency of sample distributions

Discussion in 'Probability' started by PavelK, Oct 22, 2010.

  1. PavelK

    PavelK Guest

    I do apologize in advance for, perhaps, a silly/meaningless question.

    Suppose that k independent studies have been carried out to estimate
    parameters of some population distribution. What I have is k reports,
    each describing its sample (in particular its size n_i), the estimated
    mean m_i, and the estimated variance s_i. Also suppose that all sample
    distributions are normal as is the population distribution.

    I'd like to reason about these reports. To be more precise, I'd like
    to judge whether they are mutually consistent or not. Is there a
    standard criterion of consistency which I can use to do that?

    I'm reading about statistical hypothesis testing, but my case seems to
    be slightly different (or I can't see why it's not different). I do
    *not* have a hypothesis. I simply want to know whether there *exists*
    a hypothesis (not necessarily a unique one) which is supported by all
    the sample distributions. Does this question make sense?

    If it doesn't then there's no need to read further :)

    If it does, then let's define the set M of normal sample distributions
    P_i which are consistent (whatever the criterion is) with some fixed
    sample distribution P_0. Can I describe this set by using "boundary"
    values of m_i and s_i? An example could be something like an
    "interval" [(m_l,s_l), (m_u,s_u)] such that:

    i) any reported distribution with an estimated mean m_i < m_l (or m_i
    ii) any reported distribution with an estimated mean m_l < m_i < m_u
    would be consistent with P_0.

    As you can guess, math. statistics is certainly not my area, so any
    answer like "go and read any stat book!!" would be OK, but more
    precise pointers would be very welcome. Thanks.
    PavelK, Oct 22, 2010
    1. Advertisements

  2. PavelK

    Ray Koopman Guest

    All you need is for the populations to be normal. Some (or even all)
    of the samples will look non-normal some of the time, and that's OK.
    Yes. It's a minority view, but I think it's the right view. Consider
    the simple situation in which we have two normal populations with
    known variances but unknown means, and we want to know if the means
    differ. The orthodox approach looks at m_1 - m_2, the difference
    between the sample means, and does a z-test. You and I would look at
    the confidence region for (µ_1,µ_2): (m_1 - µ_1)^2*n_1/sigma_1^2 +
    (m_2 - µ_2)^2*n_2/sigma_2^2 <= ChiSquare(2,1-alpha), and ask whether
    it contains any points with µ_1 = µ_2.

    It is possible for the z-test to reject the hypothesis that µ_1 = µ_2,
    even though the confidence region approach contains points with µ_1
    = µ_2. The orthodox argument is that the actual values of the means
    don't matter, that the only thing that matters is their difference
    and whether it is reasonable to take it as zero. I think that's
    nonsense in most real-world situations, that's it's unusual for
    the actual values of the means to be of absolutely no interest.
    Unfortunately, it's a little more complicated than that. See

    Arnold, B.C., & Shavelle, R.M. (1998). Joint confidence
    sets for the mean and variance of a normal distribution.
    American Statistician, 52, 133-140.

    for a discussion of problems with just one group. Simultaneous
    confidence regions for k groups would be messier. The question
    would be "does the simultaneous region for all 2k parameters
    (µ_1,sigma_1), i = 1...k, contain any points with all µ_i = µ and
    all sigma_i = sigma?".
    Ray Koopman, Oct 23, 2010
    1. Advertisements

  3. PavelK

    PavelK Guest

    Great, thanks for the pointer! Yes, this is exactly the question I'm
    interested in.

    I'm glad you've been able to make some sense out of my question. In
    fact, I'm a logician and in logic we always work with sets of
    statements and ask whether they are consistent, i.e. whether there
    exists a model which satisfies all of them. This is why I wondered if
    a similar approach, although non-standard, could work with statistics
    (where statements are descriptions of sample distributions).

    Now I'm off to read about joint confident regions. Thanks again,

    PavelK, Oct 23, 2010
  4. PavelK

    PavelK Guest

    Is there a good discussion of the difference between the two views? It
    seems that there can also be a situation when a test would confirm the
    hypothesis that µ_1 = µ_2, while this point lies outside of the
    simultaneous confidence region (as defined above). Or not?

    Also, am I right that the confidence region can be defined for any
    probability distribution? If yes, is there anything to read about for
    which distributions there're exact analytic confidence regions?

    Thanks again,
    PavelK, Oct 25, 2010
  5. PavelK

    Ray Koopman Guest

    Not that I know of, although I don't expect that I'm the first one
    to complain about the situation. The basis of my objection is that
    testing H: µ_1 = µ_2 is not the same as testing H: (µ_1,µ_2) = (x,x)
    in which x has a particular value. There is an established but less
    well known procedure for testing the latter hypothesis, and to me
    the results of those tests have a higher logical priority than the
    results of tests of the global H: µ_1 = µ_2, which should be done
    only if the particular values of µ_1 and µ_2 are of no interest,
    if only their difference matters.
    For this simple problem, the simultaneous confidence region is an
    ellipse, centered at (m_1,m_2) with axes parallel to the x_1 and x_2
    axes. The corresponding region based on the test of H: µ_1 = µ_2 is
    a linear band that runs parallel to the line x_1 = x_2, extends to
    infinity in the (+,+) and (-,-) directions, and contains (m_1,m_2)
    midway between its edges. So a putative (µ_1,µ_2) can be in either,
    neither, or both regions.

    However, neither procedure can *confirm* µ_1 = µ_2 to the exclusion
    of all other possibilities. Such confirmation would occur only if
    the ellipse shrank to a single point (x,x), or the band narrowed to
    become the line x_1 = x_2.
    Well, I suppose you could talk about a confidence region for an
    entire distribution, but it's more usual to talk about a confidence
    region for the parameters of the distribution. (If there is only one
    parameter then it's a confidence interval, a 1-dimensional region.)

    The simultaneous confidence region for several parameters
    (µ_1,µ_2,...) is defined as the set of all points (x_1,x_2,...) that
    would not be rejected by a test of H: (µ_1,µ_2,...) = (x_1,x_2,...).
    As you may have discovered already in reading about estimating the
    mean and s.d. simultaneously, such confidence regions can be easier
    to talk about than to get.
    Ray Koopman, Oct 26, 2010
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.