Compare two sets of proportions

Discussion in 'Scientific Statistics Math' started by SDC, Dec 11, 2010.

  1. SDC

    SDC Guest


    I would like a test to tell me if two sets of proportions are "similar". I
    have a set of proportions p1, p2, ..., pn and q1, q2, ..., qn (where p1+p2+
    .... +pn =1 and q1+q2+ ... +qn =1) and would like a test that tells me if:

    p1 = q1 and p2=q2 and ... pn = qn.

    (I also have the counts behind the proportions if that is useful information
    to have)

    SDC, Dec 11, 2010
    1. Advertisements

  2. SDC

    danheyman Guest

    The first test is to plot the pairs (p_i,q_i); if it looks like a 45
    deg. straight line you're in business. If most points fit, examine the
    others to see if there might be a good reason they don't fit. (Fewer
    counts perhaps.) If you want to use a formal test, you need to have a
    probability model. The chi-square goodness-of-fit test might apply,
    but without a probability model it can't be justified.
    danheyman, Dec 11, 2010
    1. Advertisements

  3. SDC

    Rich Ulrich Guest

    The proportions give you something that is useful
    to eyeball.

    The Ns are needed for a test, since they are what
    give some indication of how much anyone should
    believe a proportion.

    The usual test is the Pearson contingency table
    chi-squared. Google can find you a chi-square calculator.
    Here is one -

    I suggest that you compute and look at both the row and
    column propoortions while you are reckoning how uniform
    the proportions are. For your k x 2 table, whatever
    unbalance exists might be easier to see by contrasting
    the sets of 2 instead of looking at the sets of k.
    Rich Ulrich, Dec 11, 2010
  4. SDC

    SDC Guest

    A bit more background. The p and q represent the proportion of people in
    each of six age groups. What I want to say is are the proportions within
    each age group the same, considering all age groups simultaneously? The p's
    are from my survey and the q's are from the national census, so the counts
    behind the p's are much much smaller than the q's.

    I thought that the assumption would be that each sample was from a
    multivariate multinomial distribution and the test would come from this
    SDC, Dec 12, 2010
  5. Luis A. Afonso, Dec 12, 2010
  6. SDC

    SDC Guest

    Thanks, so if I have say six age groups then I think I only have 6 pairs to
    compare, eg if p are my survey proportions and q are those from the census
    the calculations are:

    Abs(p1 - q1)
    Abs(p2 - q2)
    Abs(p3 - q3)
    Abs(p4 - q4)
    Abs(p5 - q5)
    Abs(p6 - q6)
    SDC, Dec 12, 2010
  7. SDC

    Rich Ulrich Guest

    The Pearson chisquare test of a contingency table, or for
    goodness of fit, can be justified by any of several assumptions
    of distribution, also including Poisson and normal.

    If you want to use the population proportions, then
    what you test will be the observed Ns for a fixed set
    of expected p's -- "goodness of fit" to the expected
    proportions. The test still requires that you provide your
    original total-N and the set of proportions, or, equivalently,
    the set of Ns for the separate cells.

    The formula for calculation by hand will be shorter for
    using the proportions instead of the Total-pop numbers.
    You will find more on-line calculators that are set up for
    comparing two sets of Ns.
    Rich Ulrich, Dec 13, 2010
  8. Date: Dec 12, 2010 1:00 PM
    Author: SDC
    Subject: Re: Compare two sets of proportions

    Abs(p1 - q1)
    Abs(p2 - q2)
    Abs(p3 - q3)
    Abs(p4 - q4)
    Abs(p5 - q5)
    Abs(p6 - q6)

    My response

    Use 2tails t-test in each of the FIVE comparisons

    Luis A. Afonso, Dec 13, 2010
  9. SDC

    Bruce Weaver Guest

    Here's an online calculator for the goodness of fit test, should you
    decide to go that way.
    Bruce Weaver, Dec 13, 2010
  10. ___p_j = national proportion

    ____Z(i, j) = | p_i - p_j| /s

    _______s =
    sqrt [ p_i *(1- p_i) /n_i + [ p_j *(1- p_j) /n_j]

    _approx.Z(i, j) =
    | p_i - p_j| / sqrt [ p_i *(1- p_i) /n_i ]

    If n_i > 100 (say) you could use the Normal Statistics, otherwise better to prefer the Students n_i - 1 degrees of freedom.

    Luis A. Afonso, Dec 13, 2010
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.