Discussion in 'Probability' started by PavelK, Oct 22, 2010.

1. PavelKGuest

I do apologize in advance for, perhaps, a silly/meaningless question.

Suppose that k independent studies have been carried out to estimate
parameters of some population distribution. What I have is k reports,
each describing its sample (in particular its size n_i), the estimated
mean m_i, and the estimated variance s_i. Also suppose that all sample
distributions are normal as is the population distribution.

I'd like to reason about these reports. To be more precise, I'd like
to judge whether they are mutually consistent or not. Is there a
standard criterion of consistency which I can use to do that?

be slightly different (or I can't see why it's not different). I do
*not* have a hypothesis. I simply want to know whether there *exists*
a hypothesis (not necessarily a unique one) which is supported by all
the sample distributions. Does this question make sense?

If it doesn't then there's no need to read further

If it does, then let's define the set M of normal sample distributions
P_i which are consistent (whatever the criterion is) with some fixed
sample distribution P_0. Can I describe this set by using "boundary"
values of m_i and s_i? An example could be something like an
"interval" [(m_l,s_l), (m_u,s_u)] such that:

i) any reported distribution with an estimated mean m_i < m_l (or m_i
ii) any reported distribution with an estimated mean m_l < m_i < m_u
would be consistent with P_0.

As you can guess, math. statistics is certainly not my area, so any
answer like "go and read any stat book!!" would be OK, but more
precise pointers would be very welcome. Thanks.

PavelK, Oct 22, 2010

2. Ray KoopmanGuest

All you need is for the populations to be normal. Some (or even all)
of the samples will look non-normal some of the time, and that's OK.
Yes. It's a minority view, but I think it's the right view. Consider
the simple situation in which we have two normal populations with
known variances but unknown means, and we want to know if the means
differ. The orthodox approach looks at m_1 - m_2, the difference
between the sample means, and does a z-test. You and I would look at
the confidence region for (µ_1,µ_2): (m_1 - µ_1)^2*n_1/sigma_1^2 +
(m_2 - µ_2)^2*n_2/sigma_2^2 <= ChiSquare(2,1-alpha), and ask whether
it contains any points with µ_1 = µ_2.

It is possible for the z-test to reject the hypothesis that µ_1 = µ_2,
even though the confidence region approach contains points with µ_1
= µ_2. The orthodox argument is that the actual values of the means
don't matter, that the only thing that matters is their difference
and whether it is reasonable to take it as zero. I think that's
nonsense in most real-world situations, that's it's unusual for
the actual values of the means to be of absolutely no interest.
Unfortunately, it's a little more complicated than that. See

Arnold, B.C., & Shavelle, R.M. (1998). Joint confidence
sets for the mean and variance of a normal distribution.
American Statistician, 52, 133-140.

for a discussion of problems with just one group. Simultaneous
confidence regions for k groups would be messier. The question
would be "does the simultaneous region for all 2k parameters
(µ_1,sigma_1), i = 1...k, contain any points with all µ_i = µ and
all sigma_i = sigma?".

Ray Koopman, Oct 23, 2010

3. PavelKGuest

[snip]
Great, thanks for the pointer! Yes, this is exactly the question I'm
interested in.

I'm glad you've been able to make some sense out of my question. In
fact, I'm a logician and in logic we always work with sets of
statements and ask whether they are consistent, i.e. whether there
exists a model which satisfies all of them. This is why I wondered if
a similar approach, although non-standard, could work with statistics
(where statements are descriptions of sample distributions).

Pavel

PavelK, Oct 23, 2010
4. PavelKGuest

Is there a good discussion of the difference between the two views? It
seems that there can also be a situation when a test would confirm the
hypothesis that µ_1 = µ_2, while this point lies outside of the
simultaneous confidence region (as defined above). Or not?

Also, am I right that the confidence region can be defined for any
which distributions there're exact analytic confidence regions?

Thanks again,
Pavel

PavelK, Oct 25, 2010
5. Ray KoopmanGuest

Not that I know of, although I don't expect that I'm the first one
to complain about the situation. The basis of my objection is that
testing H: µ_1 = µ_2 is not the same as testing H: (µ_1,µ_2) = (x,x)
in which x has a particular value. There is an established but less
well known procedure for testing the latter hypothesis, and to me
the results of those tests have a higher logical priority than the
results of tests of the global H: µ_1 = µ_2, which should be done
only if the particular values of µ_1 and µ_2 are of no interest,
if only their difference matters.
For this simple problem, the simultaneous confidence region is an
ellipse, centered at (m_1,m_2) with axes parallel to the x_1 and x_2
axes. The corresponding region based on the test of H: µ_1 = µ_2 is
a linear band that runs parallel to the line x_1 = x_2, extends to
infinity in the (+,+) and (-,-) directions, and contains (m_1,m_2)
midway between its edges. So a putative (µ_1,µ_2) can be in either,
neither, or both regions.

However, neither procedure can *confirm* µ_1 = µ_2 to the exclusion
of all other possibilities. Such confirmation would occur only if
the ellipse shrank to a single point (x,x), or the band narrowed to
become the line x_1 = x_2.
Well, I suppose you could talk about a confidence region for an
entire distribution, but it's more usual to talk about a confidence
region for the parameters of the distribution. (If there is only one
parameter then it's a confidence interval, a 1-dimensional region.)

The simultaneous confidence region for several parameters
(µ_1,µ_2,...) is defined as the set of all points (x_1,x_2,...) that
would not be rejected by a test of H: (µ_1,µ_2,...) = (x_1,x_2,...).