# Best 20 Questions Questions

Discussion in 'Probability' started by HumingBean, Jan 10, 2007.

1. ### HumingBeanGuest

In the game of 20 Questions you may ask only Yes or No questions to try
to guess a target such as a pencil or Elvis Presley in as few questions
as possible. Please assume that I've determined the target is one of
the Three Stooges, Larry, Curly or Moe.

Which of the following, Question A or Question B, is better to lead off
with?

Question A: Is it Larry?

Question B: Is it someone other than Larry?

(Note that in both questions above I chose Larry arbitrarily.)

If the answer to Question A is Yes then I solved it in one question,
and that will happen 33% of the time.

If the answer to Question A is No, which will happen 67% of the time,
then I've got a 50% chance of solving it in a total of two questions
(Is it Larry? No. Is it Moe? Yes.) and a 50% chance of solving it in
three (Is it Larry? No. Is it Curly? No. Is it Moe? Yes.).

If the answer to Question B is No then I've solved it in two questions
(Is it someone other than Larry? No. Is it Larry? Yes.), which will
happen 33% of the time.

If the answer to Question B is Yes, which will happen 67% of the time,
then I've got a 50% chance of solving it in a total of two questions
(Is it someone other than Larry? Yes. Is it Moe? Yes.) and a 50%
chance of solving it in three (Is it someone other than Larry? Yes. Is
it Curly? No. Is it Moe? Yes.)

Accordingly, I conclude I should start with Question A, not Question B.
Have I analyzed this correctly? If so, is there a way to express in
math terms what I've written in prose? If if have not analyzed this
correctly, what is the reasoning I should have used?

What about more complicated circumstances such as 4 or 13 choices

Thanks for any suggestions you have.

HumingBean, Jan 10, 2007

2. ### Pavel314Guest

At this point, rather than reverting to the positive form of the question,
ask "Is it someone other than Moe?". If the answer is yes, you know it's
Curly because Larry has been ruled out by the first question. If the answer
is no, you know it's Moe. For either response you have your answer in a
maximum of two questions.

Paul

Pavel314, Jan 11, 2007

3. ### hagmanGuest

Your question strategy can be modeled by a binary tree, where at each
node (=question)
there is a YES branch and a NO branch.

Thus with at most n questions, you cannot distinguish more than 2^n
different objects,
as there are 2^n leaf nodes in a binary tree of depth n.
However, in the game of 20 questions, it is not enough to have
determined the object, you must do so by
a final positively answered "Is it x?" question.
Therefore, for half the leaf nodes (those at a final NO branch), you
need an extra question.
This expands the tree by one level and ensures that all leaf nodes are
on YES branches (and
some penultimate nodes have no NO branch)
Put the other way around, with n questions, You can force a win only if
there are at most 2^(n-1) objects.
Thus, if M(m) is the worst case number of questions needed for m
objects, we immediately find
M(1)=1 (even though you know the result beforehand!)
M(2)=2
M(3)=3
M(4)=3
M(5)=4
....
M(8)=4
M(9)=5
....
In general, M(n) = ceiling(ld(n))+1.
Apparently, an optimal strategy would be to always cut in half as good
as possible at every step.
Note that under /this/ analysis, "Is it Larry?" and "Is it someone
other than Larry?"
are both equally good, as neither can avoid that 3 questions might be
necessary in worst case.

Another problem would be:
If I have more items than coverable by the questions I am allowed, how
can I maximize
the probability of winning?
Clearly, with n questions allowed, one should simply select 2^(n-1) of
the possible objects
and play as if all other items were impossible.
I all given items are eually likely, it doesn't matter which you
select.
Otherwise, you should select the 2^(n-1) most likely results.

Still another problem (and closest to what you elaborated i your post):
How can I minimize the expected number of questions needed?
I will (as you did) assume that all possible objects are equally
likely.
Then we are looking for a binary tree where all leafs are on YES
branches such that the
sum of all paths lengths from the root to the various leafs is minimal.

Let E(n) be the minimal such sum over path lengths under the constraint
that there are n leafs.
Then we have
E(1)=1
E(2)=3
E(3)=6
E(4)=10
as a few starting values.
For higher n, we have
(1) E(n) = n + min_{0<k<n} (F(k)+E(n-k))
where F(1)=0 and F(k)=E(k) otherwise (this special treatment is to
treat the special
case of an immediate win when asking for a single object).
Thus
F(1)=0
F(2)=3
F(3)=6
F(3)=10

Since we consider (1) only for n>1 and F(1)+E(n-1) <
F(n-1)+E(1)=E(n-1)+E(1), we have
(2) F(n) = n + min_{0<k<n} (F(k)+F(n-k))
To simplify further, we may restrict to k<=n/2.

Both when minimizing the worst case as when maximizing the winning
probability,
one optimal strategy was to always cut in half as good as possible.
Could a different strategy be advantageous when minimizing the expected
number of questions?
E.g., without detailed calculation it looks tempting to split of powers
of 2 or the like slightly bigger
than n/2 and hope for easier cases with a certain probability.
Obviously, some special care needs to be taken for n=3 (your analysis).
Alas, we have the following:

PROPOSITION: Let G(n) =F(n+1)-F(n). Then for each n>2 the following
statements are true:
A(n): F(n) = n + F([n/2]) + F([(n+1)/2])
B(n): G(r) is non-decreasing for 1<=r<n.

Proof:
A(3) is clear as F(3) = 3+F(1)+F(2), A(4) is clear as F(4) =
4+F(2)+F(2).
B(3) and B(4) are also trivially clear form G(1)=3, G(2)=3, G(3)=4.

Assume n>4 and that A(n') and B(n') holds for all m with 2<n'<n.
Write n=2m+d with d=0 or 1, depending on whether n is odd or even.
According to (2), we have
F(n) = n+F(k)+F(n-k)
for some k with 0<k<=m.
By minimality,
n+F(k)+F(n-k) <= n+F(m)+F(n-m),
hence
F(n-k)-F(n-m) <= F(m)-F(k).
The left hand side is G(n-m)+...+G((n-m)+(m-k-1)), the right hand side
is G(k)+...+G(k+(m-k-1)).
Since n-m>=k we conclude from B(n-1) that the left hand side is at
least as big as the right hand side (note that (n-m)+(m-k-1) < n-1).
Thus
F(n) = n+F(k)+F(n-k) = n+F(m)+F(n-m),
and A(n) follows from m=[n/2] and n-m=[(n+1)/2].
To show B(n), we need only show G(n-1)>=G(n-2).
Indeed, from A(n), A(n-1), A(n-2), we have
(i) if n=2m is even:
F(n) = n+F(m)+F(m); F(n-1) = n-1 + F(m-1)+F(m); F(n-2)=n-2 +
F(m-1)+F(m-1).
Thus G(n-1)-G(n-2) = F(n)-2F(n-1)+F(n-2) = 0.
(ii) if n=2m+1 is odd:
F(n) = n+F(m)+F(m+1); F(n-1) = n-1 + F(m)+F(m); F(n-2)=n-2 +
F(m-1)+F(m).
Thus G(n-1)-G(n-2) = F(n)-2F(n-1)+F(n-2) = F(m+1)+F(m-1) >=0.
In both cases we conclude G(n-1)>=G(n-2) and thus B(n).
Therefore, the proposition follows by induction. QED.

As a consequence of the proposition we have found that one (not the)
optimal strategy is
to always ask for [n/2] of the remaining objects (this implies asking
specifically for 1 of three objects).

BTW, the actual expected number of questions needed for n>1 objects is
F(n)/n.
With a bit more analysis, one can show that
ld(4/3*n)
is a fairly good approximation for F(n)/n, in fact that
F(n)/n >= ld(4/3 * n) for all n>1
with equality if and only if n is 3 times a power of 2.

PROBLEM:
Determine
sup { F(n)/n - ld(4/3 * n) }

hagman, Jan 13, 2007
4. ### William ShanleyGuest

Your two questions A and B result in the same amount of information
gained. The only difference is that in situation B, you don't consider
the game over, even though you have determined the answer is Larry.
The second "guess" is not a guess at all. Mathematically, you discover
the answer on the same turn of the game.