Best 20 Questions Questions

Discussion in 'Probability' started by HumingBean, Jan 10, 2007.

  1. HumingBean

    HumingBean Guest

    In the game of 20 Questions you may ask only Yes or No questions to try
    to guess a target such as a pencil or Elvis Presley in as few questions
    as possible. Please assume that I've determined the target is one of
    the Three Stooges, Larry, Curly or Moe.

    Which of the following, Question A or Question B, is better to lead off
    with?

    Question A: Is it Larry?

    Question B: Is it someone other than Larry?

    (Note that in both questions above I chose Larry arbitrarily.)

    If the answer to Question A is Yes then I solved it in one question,
    and that will happen 33% of the time.

    If the answer to Question A is No, which will happen 67% of the time,
    then I've got a 50% chance of solving it in a total of two questions
    (Is it Larry? No. Is it Moe? Yes.) and a 50% chance of solving it in
    three (Is it Larry? No. Is it Curly? No. Is it Moe? Yes.).


    If the answer to Question B is No then I've solved it in two questions
    (Is it someone other than Larry? No. Is it Larry? Yes.), which will
    happen 33% of the time.

    If the answer to Question B is Yes, which will happen 67% of the time,
    then I've got a 50% chance of solving it in a total of two questions
    (Is it someone other than Larry? Yes. Is it Moe? Yes.) and a 50%
    chance of solving it in three (Is it someone other than Larry? Yes. Is
    it Curly? No. Is it Moe? Yes.)

    Accordingly, I conclude I should start with Question A, not Question B.
    Have I analyzed this correctly? If so, is there a way to express in
    math terms what I've written in prose? If if have not analyzed this
    correctly, what is the reasoning I should have used?


    What about more complicated circumstances such as 4 or 13 choices
    instead of 3?

    Thanks for any suggestions you have.
     
    HumingBean, Jan 10, 2007
    #1
    1. Advertisements

  2. HumingBean

    Pavel314 Guest

    At this point, rather than reverting to the positive form of the question,
    ask "Is it someone other than Moe?". If the answer is yes, you know it's
    Curly because Larry has been ruled out by the first question. If the answer
    is no, you know it's Moe. For either response you have your answer in a
    maximum of two questions.

    Paul
     
    Pavel314, Jan 11, 2007
    #2
    1. Advertisements

  3. HumingBean

    hagman Guest

    Your question strategy can be modeled by a binary tree, where at each
    node (=question)
    there is a YES branch and a NO branch.

    Thus with at most n questions, you cannot distinguish more than 2^n
    different objects,
    as there are 2^n leaf nodes in a binary tree of depth n.
    However, in the game of 20 questions, it is not enough to have
    determined the object, you must do so by
    a final positively answered "Is it x?" question.
    Therefore, for half the leaf nodes (those at a final NO branch), you
    need an extra question.
    This expands the tree by one level and ensures that all leaf nodes are
    on YES branches (and
    some penultimate nodes have no NO branch)
    Put the other way around, with n questions, You can force a win only if
    there are at most 2^(n-1) objects.
    Thus, if M(m) is the worst case number of questions needed for m
    objects, we immediately find
    M(1)=1 (even though you know the result beforehand!)
    M(2)=2
    M(3)=3
    M(4)=3
    M(5)=4
    ....
    M(8)=4
    M(9)=5
    ....
    In general, M(n) = ceiling(ld(n))+1.
    Apparently, an optimal strategy would be to always cut in half as good
    as possible at every step.
    Note that under /this/ analysis, "Is it Larry?" and "Is it someone
    other than Larry?"
    are both equally good, as neither can avoid that 3 questions might be
    necessary in worst case.


    Another problem would be:
    If I have more items than coverable by the questions I am allowed, how
    can I maximize
    the probability of winning?
    Clearly, with n questions allowed, one should simply select 2^(n-1) of
    the possible objects
    and play as if all other items were impossible.
    I all given items are eually likely, it doesn't matter which you
    select.
    Otherwise, you should select the 2^(n-1) most likely results.


    Still another problem (and closest to what you elaborated i your post):
    How can I minimize the expected number of questions needed?
    I will (as you did) assume that all possible objects are equally
    likely.
    Then we are looking for a binary tree where all leafs are on YES
    branches such that the
    sum of all paths lengths from the root to the various leafs is minimal.

    Let E(n) be the minimal such sum over path lengths under the constraint
    that there are n leafs.
    Then we have
    E(1)=1
    E(2)=3
    E(3)=6
    E(4)=10
    as a few starting values.
    For higher n, we have
    (1) E(n) = n + min_{0<k<n} (F(k)+E(n-k))
    where F(1)=0 and F(k)=E(k) otherwise (this special treatment is to
    treat the special
    case of an immediate win when asking for a single object).
    Thus
    F(1)=0
    F(2)=3
    F(3)=6
    F(3)=10

    Since we consider (1) only for n>1 and F(1)+E(n-1) <
    F(n-1)+E(1)=E(n-1)+E(1), we have
    (2) F(n) = n + min_{0<k<n} (F(k)+F(n-k))
    To simplify further, we may restrict to k<=n/2.

    Both when minimizing the worst case as when maximizing the winning
    probability,
    one optimal strategy was to always cut in half as good as possible.
    Could a different strategy be advantageous when minimizing the expected
    number of questions?
    E.g., without detailed calculation it looks tempting to split of powers
    of 2 or the like slightly bigger
    than n/2 and hope for easier cases with a certain probability.
    Obviously, some special care needs to be taken for n=3 (your analysis).
    Alas, we have the following:

    PROPOSITION: Let G(n) =F(n+1)-F(n). Then for each n>2 the following
    statements are true:
    A(n): F(n) = n + F([n/2]) + F([(n+1)/2])
    B(n): G(r) is non-decreasing for 1<=r<n.

    Proof:
    A(3) is clear as F(3) = 3+F(1)+F(2), A(4) is clear as F(4) =
    4+F(2)+F(2).
    B(3) and B(4) are also trivially clear form G(1)=3, G(2)=3, G(3)=4.

    Assume n>4 and that A(n') and B(n') holds for all m with 2<n'<n.
    Write n=2m+d with d=0 or 1, depending on whether n is odd or even.
    According to (2), we have
    F(n) = n+F(k)+F(n-k)
    for some k with 0<k<=m.
    By minimality,
    n+F(k)+F(n-k) <= n+F(m)+F(n-m),
    hence
    F(n-k)-F(n-m) <= F(m)-F(k).
    The left hand side is G(n-m)+...+G((n-m)+(m-k-1)), the right hand side
    is G(k)+...+G(k+(m-k-1)).
    Since n-m>=k we conclude from B(n-1) that the left hand side is at
    least as big as the right hand side (note that (n-m)+(m-k-1) < n-1).
    Thus
    F(n) = n+F(k)+F(n-k) = n+F(m)+F(n-m),
    and A(n) follows from m=[n/2] and n-m=[(n+1)/2].
    To show B(n), we need only show G(n-1)>=G(n-2).
    Indeed, from A(n), A(n-1), A(n-2), we have
    (i) if n=2m is even:
    F(n) = n+F(m)+F(m); F(n-1) = n-1 + F(m-1)+F(m); F(n-2)=n-2 +
    F(m-1)+F(m-1).
    Thus G(n-1)-G(n-2) = F(n)-2F(n-1)+F(n-2) = 0.
    (ii) if n=2m+1 is odd:
    F(n) = n+F(m)+F(m+1); F(n-1) = n-1 + F(m)+F(m); F(n-2)=n-2 +
    F(m-1)+F(m).
    Thus G(n-1)-G(n-2) = F(n)-2F(n-1)+F(n-2) = F(m+1)+F(m-1) >=0.
    In both cases we conclude G(n-1)>=G(n-2) and thus B(n).
    Therefore, the proposition follows by induction. QED.

    As a consequence of the proposition we have found that one (not the)
    optimal strategy is
    to always ask for [n/2] of the remaining objects (this implies asking
    specifically for 1 of three objects).

    BTW, the actual expected number of questions needed for n>1 objects is
    F(n)/n.
    With a bit more analysis, one can show that
    ld(4/3*n)
    is a fairly good approximation for F(n)/n, in fact that
    F(n)/n >= ld(4/3 * n) for all n>1
    with equality if and only if n is 3 times a power of 2.

    PROBLEM:
    Determine
    sup { F(n)/n - ld(4/3 * n) }
     
    hagman, Jan 13, 2007
    #3
  4. Your two questions A and B result in the same amount of information
    gained. The only difference is that in situation B, you don't consider
    the game over, even though you have determined the answer is Larry.
    The second "guess" is not a guess at all. Mathematically, you discover
    the answer on the same turn of the game.

    I hope this was helpful,

    William Shanley
     
    William Shanley, Jan 22, 2007
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.