Identifing the most anomolous member of a set

Discussion in 'Probability' started by Prof Wonmug, Sep 21, 2010.

  1. Prof Wonmug

    Prof Wonmug Guest

    I have a large database containing the text from various sources
    (newspapers, magazines, books, etc.) from 1950-1999. The text is
    broken down by decade (50s, 60s, 70s, 80s, 90s). I have a program that
    counts the number of times each word occurs in the text in each
    decade.

    Here are the counts for a few words:

    50s 60s 70s 80s 90s
    120 0 0 0 0
    23051 12 19 55 9
    1 6032 501 2 28
    537 25384 1544 818 220
    2 220 2285 5 13
    6 86 1322 12 58
    1075 331 3266 11882 319
    45 18 143 1541 16
    1 0 3 0 4156
    189 143 959 283 22541

    I am not sure what the correct terminology is. I would like to
    calculate the "skew" in each of these numbers. That is, I would like
    to know how unlikely it is to get "120" in the first row. I guess this
    would be as compared to having the numbers evenly distributed across
    the decades (24 24 24 24 24).

    I would like to calculate this value for each number, but I am only
    interested in the largest in each row.
     
    Prof Wonmug, Sep 21, 2010
    #1
    1. Advertisements

  2. Prof Wonmug

    Ray Koopman Guest

    The p-values for the maximum in each row are all infinitesimal:

    5.34 * 10^-106
    1.66 * 10^-19901
    2.65 * 10^-4607
    1.15 * 10^-18450
    2.29 * 10^-1705
    1.85 * 10^-963
    3.75 * 10^-5824
    1.80 * 10^-1089
    3.47 * 10^-3607
    1.95 * 10^-17670

    In the notation of

    Fuchs, C., & Kenett, R. (1980). A test for detecting outlying cells
    in the multinomial distribution and two-way contingency tables.
    Journal of the American Statistical Association 75, 395-398,

    the p-values were obtained by setting M+* to the observed max Zi,
    and inverting the upper bound for M+* in equation 3.6.

    For the record, the Mathematica code I used was

    px[f_?VectorQ] := With[{k = [email protected], n = [email protected]},
    Erfc[(k*[email protected] - n)/Sqrt[2n(k-1.)]]*k/2 ]
     
    Ray Koopman, Sep 21, 2010
    #2
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.