# Is a most-likely probability 'better' depending on the size of thenext-most-likely?

Discussion in 'Scientific Statistics Math' started by Steve, Feb 2, 2010.

1. ### SteveGuest

Hi,

I'm working on an algorithm to guess the correct English word within
text in which some words have become illegible.
It boils down to creating a list of candidate words, along with their
probabilities, and choosing the most likely.
Alternating between training data, and new test data, I can establish
that the probability estimations are fairly accurate. (Though to be
useful, the algorithm needs to provide a shorter candidate list in the
first place!)

Suppose I have two cases:
A) There are 2 candidate words with probabilities 0.51 and 0.49.
B) There are 101 candidate words, one with P=0.51, and a hundred
others all with P = .0049.

One of the approaches the algorithm takes is based on the N recent
known words prior to the unknown word (its Ngram), so there are
inevitably situations when the Ngram contains words that have
themselves been corrected in a prior step. If this is the case, I need
to know how much I can rely on that previous result.
Is there any basis for believing that in case B) the result is more
trustworthy? After all, the choice with P=0.51 is more than 100 times
more likely than the next best word. But in case A) there's virtually
nothing to choose between them.
Rightly or wrongly, that's how I intuitively feel about the choices,
but then I remember... both 'best choices' will be wrong 49% of the
time, so it doesn't make any difference!

Is there a measure for this, or is it totally irrelevant?
------------

Eventually the goal is to have a much higher confidence than 0.51 in a
single choice, but there will occasionally be situations with these
borderline results. In these cases I'll offer the user a drop-down
replacement list with all the choices and their probabilities, for
them to pick from.
way that they would be more confident making a choice in case B) than
A) .

Any thoughts?... Is this a bit of a Monty Hall problem?

Thanks

Steve

Steve, Feb 2, 2010

2. ### David JonesGuest

Have you thought of involving a cost function? This would give a value/cost/utility to choosing word B, if word A is actually correct. Then some aspects of your problem would eventually become generalised to comparing a single alternative with its cost, with lots of small probabilities each having different costs. In such a case, you might prefer the second if a lot of the small probabilities are associated with small costs and only a few with high costs.

Here "cost" might be used to distinguish similar words with similar meanings from similar words with different meanings.

David Jones

David Jones, Feb 2, 2010

3. ### SteveGuest

Thanks David, I hadn't thought of that idea. There's a lot of
parameters linked to each candidate, such as meaning, part-of-speech,
useage-frequency, collocation-frequency, context likelihood etc. so I
could certainly shape some kind of cost for going against the grain of
these.

I'm still wondering if there's some simple heuristic involved with
cases like these though.

An example might be if I collected millions of usenet postings and
found significant amounts of these examples:

"Just my two * worth" and found * = cents 51% of the time and found
rupees, yen etc 4.9% of the time for 10 variations.

and

"I married my * in a church" with * = wife 51% and husband 49%.

Even if you were sure the probabilities were very accurate, the
'cents' example just seems a safer bet because each alternative is
quite unlikely.

Steve

Steve, Feb 2, 2010
4. ### David JonesGuest

The cost approach has the potential that, with an extremely large amount of work, you could do a thorough application of decision theory to tell you what to choose in any given case. But it can also help to think about the problem. One of the essential parts are probabilities like that of "being wrong if I choose this one", as this would weight the cost of the choice. If you think about these, rather than the probability that "this one is right", then it would help to justify your feeling about the intrepretation to be made when you have lots of small probabilities ...these would convert to lots of instances where the probability of being wrong is high.

David Jones

David Jones, Feb 3, 2010
5. ### SteveGuest

It's definitely given me a new perspective on weighing these kind of
choices. There's no shortage of test data to try the cost approach and
see how it performs compared to simple probability alone.

Thanks

Steve

Steve, Feb 3, 2010