Benford & Zipf's Laws: Application

Discussion in 'Scientific Statistics Math' started by voice_of_reason, Dec 1, 2010.

  1. Greetings:

    I am considering trying to use Benford and/or Zipf's law to detect
    fraudulent test results reported from a contracted agency.

    When researching methods of application, I came across the following
    caveat:

    "Benford's law can only be applied to data that is distributed across
    multiple orders of magnitude."

    In our case, the acceptable range for test results is 7.000~7.999.

    What I would like to know is if I can simply scale the data
    (effectively subtract 7 and multiply by 100) and them apply Benford's
    law to analyze the remainder values? If I can, would the remaining
    digits be considered as their original positions, or current? For
    example, if the original value is 7.100 -> 100 does the "1" count as
    falling in the first position or 2nd?

    Advice and assistance appreciated!

    Thanx!!
     
    voice_of_reason, Dec 1, 2010
    #1
    1. Advertisements

  2. voice_of_reason

    Gordon Sande Guest

    A restricted example of when this might occur is when there is some
    alternate unit of measure which differs by a multiplictive constant.
    For example grams or ounces.
    I once overheard a situation in which reported times of events
    tended to end in either 0 or 5 much too often to suggest that
    the recording of the event was greatly delayed when they were
    supposed to be recorded almost immediately after they happened.
     
    Gordon Sande, Dec 1, 2010
    #2
    1. Advertisements

  3. I suggest you shouldn't. The question that you posted demonstrates
    clearly that you are quite clueless how this works, so you are very,
    very likely to either not detect any fraud or to make false
    accusations; both could be very expensive. Hire someone who knows what
    they are doing, like some first year maths student.
     
    christian.bau, Dec 1, 2010
    #3
  4. voice_of_reason

    Rich Ulrich Guest

    Natural data that are distributed across several orders of magnitude
    are often log-normal. If taking the log of the 'scores' doesn't give
    a much more balanced distribution of some sort, you don't have
    a chance of applying Zipf's Law usefully. The Wiki page on
    Zipf's Law seems to be helpful. It suggests a regression.
    Well, if the "7" was an artificial component to a value.... Does
    subtracting 7 leave something that looks like a log-normal?

    The advice seems appropriate, that you should hire someone.
     
    Rich Ulrich, Dec 1, 2010
    #4
  5. I am not the least bit ashamed of being "clueless"...we all are before
    we learn. But at least I'm willing to ask and learn. I have often
    found that when someone criticizes a question rather than providing an
    answer, its generally because they are unable to answer....

    Thanks anyway...
     
    voice_of_reason, Dec 2, 2010
    #5
  6. Thank you for your reply......

    Thank you, I'll check out the page. But I'm curious to know: does a
    lack of log-normal distribution in the data itself indicate that the
    data may well not be "natural" data?
     
    voice_of_reason, Dec 2, 2010
    #6
  7. voice_of_reason

    Tim Little Guest

    No.


    - Tim
     
    Tim Little, Dec 2, 2010
    #7
  8. voice_of_reason

    Rob Johnson Guest

    Benford's Law applies to data which spans several decades. Averaging
    over the decades is what evens out the fractional part of the common
    logarithm. See <http://www.whim.org/nebula/math/benford.html>

    Your data spans about 5.8% of a decade (according to the way that
    Benford's Law counts decades). Artificially spreading out that data
    will have the opposite effect from averaging over several decades.
    Benford's Law most likely will not apply to this data.

    Rob Johnson <>
    take out the trash before replying
    to view any ASCII art, display article in a monospaced font
     
    Rob Johnson, Dec 2, 2010
    #8
  9. voice_of_reason

    kunzmilan Guest

    Benford and/or Zipf's laws are information laws, only. They
    approximate, only. Fraudulent test results can not be detected by some
    differences from expected results, only by finding overlooked
    [missing] data, for example.
    kunzmilan
     
    kunzmilan, Dec 3, 2010
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.