Help with conjugate priors/Bayesian problem with a repeated urn

Discussion in 'Probability' started by Jesse Perla, Aug 15, 2011.

  1. Jesse Perla

    Jesse Perla Guest

    Here is the problem that has crept up in my research, written as an
    urn problem:

    1) I have an urn with N total balls and K red balls
    2) K is unknown, N is known
    3) I set a prior on K (hopefully a conjugate prior) or potentially on
    K/N, the probability to draw a red ball...
    4) There is a repeated bernoulli draw from the urn.
    4.a) If the ball was red, I take it out and replace it with a black
    4.b) If the ball was black, I put it back in the urn.
    4.c) Regardless, I update to a posterior on K based on the success
    of the draw.
    4.d) But note that I have taken a red ball out and replaced it by a
    black ball, so I somehow need to adjust my prior! This may be the
    tricky part
    5) This process is repeated, potentially until all of the red balls
    are gone. The key variable of interest is to keep track of the prior
    on the number of red balls left in the urn.
    Written sequentially, I think the state here can be summarized as
    (note that N is fixed throughout):
    1) `t', the number of trials that have occurred.
    2) 'k_t' the number of red balls that I currently have at time 't'
    3) 'mu_t', the current prior distribution on either K or K/N.
    Hopefully this is a conjugate prior to the bernoulli trial. Lets say
    that it is a two parameter prior, 'alpha_t' and 'beta_t'

    Written recursively (which I would strongly prefer), the recursive
    state would hopefully just be (given a fixed N):
    1) k, the current stock of red balls
    2) mu, the prior, hopefully collapsed into parameters 'alpha' and
    ..... I hope/think it is possible that this is a sufficient recursive
    state since the prior may contain the sufficient information of the
    number of trials (e.g. 'alpha + beta' for the Beta distribution)

    For notation: Let 'x \in {0,1}' be the success of the Bernoulli trial
    where x = 1 if red.

    So the question is: what conjugate prior can I use and what is the
    updating procedure to get the posterior alpha' and beta' from alpha,
    beta, N, k, and x. (given that a prime denotes the posterior or next
    If we did NOT have the replacement of the black ball whenever a red
    ball was taken out, I believe the following fairly standard approach
    would work:
    1) Set the prior as the beta distribution with parameters alpha and
    2) Do a repeated sampling with the Bernoulli trial.
    3) Update to the posterior, which works since the conjugate prior of
    the Bernoulli is the Beta.
    4) The recursive updating formula would be:

    k' = k + x
    alpha' = alpha + x
    beta' = beta + 1 - x
    where the prior at any point is distributed Beta(alpha, beta)
    So to add in the complication of taking out the red ball and replacing
    it with a black one, it seems to me that we need to do a
    transformation of the prior.
    Here is one approach (that doesn't seem to work):
    0) Lets say that the conjugate prior is still Beta(alpha, beta).
    What exactly is this? It is a prior on the probability of drawing a
    red ball. i.e. K/N ~ Beta(alpha, beta)
    1) If a black ball is drawn, there is no difficulty. alpha' = alpha,
    beta' = beta + 1
    2) If a red ball is drawn...
    2.a) First update to the posterior from the previous urn. alpha' =
    alpha + 1, beta' = beta. i.e. Posterior K/N ~ Beta(alpha + 1,
    beta). Let this be a random variable called X
    2.b) Next, do a transformation of the random variable. Here we
    need to use the N since the size of the urn matters.
    Aren't we looking for what the prior on (K - 1)/N is? If so, then
    isn't this just K/N - 1/N = X - 1/N where X ~ Beta(alpha + 1, beta)
    .... but this new distribution sure doesn't look beta to me, or at
    least I couldn't figure it out. My apologies if I have done something
    stupid here.

    Also, is there a problem with discreteness here when using the beta
    distribution? Note that if there are 3 balls in the urn, and a red
    ball is drawn, should the support of the posterior include p < 1/3?
    It seems to me that a discrete conjugate prior on K directly makes
    more sense if such things exist?

    I thought there was a chance that I could look at this problem
    sequentially as a hyper-geometric problem combining all of the samples
    up to the current period, but the problem is that hypergeometric draws
    take out the red ball, but don't replace it with a black ball... And
    the beta-binomial urn problem involves adding in 2 red balls if one is


    Does anyone have any bright ideas on this? Is this a standard
    problem, documented somewhere? Is there a key reference on this sort
    of stuff for discrete data?

    Thanks for your help,
    Jesse Perla, Aug 15, 2011
    1. Advertisements

  2. Jesse Perla

    Ken Butler Guest

    One way of proceeding in a sequential process like this is to
    proceed one ball (step) at a time, and use the posterior from the previous
    step as the prior for the next step. After all, the posterior encapsulates
    all your knowledge based on the experiment so far, so you'd want to feed
    that back in as your prior (pre-experiment beliefs plus experimental
    evidence so far) for the next stage. The quantity you are estimating at
    each step is "the probability of drawing a red ball on *this* step".

    If you knew how many red balls there were originally, this would be a
    Markov chain.
    Ken Butler, Aug 15, 2011
    1. Advertisements

  3. Jesse Perla

    Ray Koopman Guest

    All you get from t trials is the knowledge that the
    urn originally contained at least k_t red balls and
    now contains at least t black balls. In particular,
    the probability distribution of the number of red
    balls remaining in the urn is just the prior,
    truncated at n-t and rescaled to sum to 1.

    Let p[t,k] denote the probability that k red balls
    remain in the urn after t trials. Your prior is p[0,k].

    Then for t = 1,...,n, the updated distribution is
    p[t,k] = p[t-1,k]/(1-p[t-1,n-t+1]), k = 1,...,n-t ;
    = 0 , k = n-t+1,...,n.

    After t trials, the posterior for the number of red balls
    that were initially in the urn is P[t,k] = p[t,k-k_t];
    i.e., the updated prior, shifted up k_t balls.
    Ray Koopman, Aug 15, 2011
  4. Jesse Perla

    Ray Koopman Guest

    That should be...................... k = 0,...,n-t
    Ray Koopman, Aug 15, 2011
  5. Jesse Perla

    Jesse Perla Guest

    Absolutely. I guess my problem is that I need to have a conjugate
    prior so the posterior at each step is tractable. Any ideas on which
    one might work?
    Jesse Perla, Aug 15, 2011
  6. Jesse Perla

    Jesse Perla Guest

    Is this true? For example, if I had a prior on the number of red
    balls and I got a long sequence of black balls with no red balls,
    wouldn't my posterior start to converge to put less and less weight on
    higher numbers of red balls?
    Jesse Perla, Aug 15, 2011
  7. Jesse Perla

    Ray Koopman Guest

    You're right. After a single bernoulli draw,
    the posterior probability of k reds is proportional to

    p'[k] = p[k] * (k/n)^x * (1-k/n)^(1-x),

    where p[k] is the prior probability, and
    x = 1 for red and 0 for black. Note that
    p'[0] = 0 if x = 1, and p'[n] = 0 if x = 0.

    Then adjust for replacement: p"[k] = p'[k + x].

    Finally, normalize p" to get the p for
    the number of reds remaining in the urn.
    Ray Koopman, Aug 15, 2011
  8. Jesse Perla

    Jesse Perla Guest

    Thanks. Makes complete sense.

    * I guess my biggest problem here is that I kind of need a conjugate
    prior to kick things off p_0[k] such that the parameterization of p_0
    is relatively small (say < 5 parameters would be computationally
    tractable for me) and most importantly is independent of n.
    * Why? My problem is essentially a recursive Bayesian learning problem
    written in dynamic programming (where n might be fairly large), so it
    is intractable for the prior to be outside of a finite parameter
    family as otherwise the dynamic state suffers from the curse of
    dimensionality as 'n' increases.
    * To start simple:
    * Lets say there was only 1 draw and we didn't have to remove the
    red at the end. Is there a discrete valued prior on 'k' that is
    conjugate for the Bernoulli draws? Every time I read about conjugacy
    for Bernoulli, they always talk about the Beta, which I am pretty sure
    won't work for this problem due to the inherent discreteness here.
    * Then, with this prior is there a way that the 'adjustment' of
    taking out the red ball keeps it in the same family?
    * If those two are successful, then we have our conjugate prior.
    Jesse Perla, Aug 16, 2011
  9. Jesse Perla

    Ray Koopman Guest

    With replacement, but without changing red to black,
    the unnormalized posterior after t trials would be

    p_t[k] = p_0[k] * k^k_t * (n-k)^(t - k_t)

    If p_0[k] is proportional to k^a (n-k)^b then

    p_t[k] = k^(a + k_t) * (n-k)^(b + t - k_t),

    which (as you noted) is an integer version of a beta dsitribution.

    However, changing red to black makes the posterior depend on the
    sequence of draws, not just on k_t. For n = 4 and 4 trials, here is
    the prior and the 16 possible draw sequences with their corresponding
    unnormalized posteriors. All the posteriors are distinct.

    p[0] p[1] p[2] p[3] p[4]

    0 0 0 0 256 p[0] 81 p[1] 16 p[2] p[3] 0
    0 0 0 1 27 p[1] 16 p[2] 3 p[3] 0 0
    0 0 1 0 36 p[1] 24 p[2] 6 p[3] 0 0
    0 1 0 0 48 p[1] 36 p[2] 12 p[3] 0 0
    1 0 0 0 64 p[1] 54 p[2] 24 p[3] 4 p[4] 0
    0 0 1 1 8 p[2] 6 p[3] 0 0 0
    0 1 0 1 12 p[2] 12 p[3] 0 0 0
    0 1 1 0 16 p[2] 18 p[3] 0 0 0
    1 0 0 1 18 p[2] 24 p[3] 12 p[4] 0 0
    1 0 1 0 24 p[2] 36 p[3] 24 p[4] 0 0
    1 1 0 0 32 p[2] 54 p[3] 48 p[4] 0 0
    0 1 1 1 6 p[3] 0 0 0 0
    1 0 1 1 12 p[3] 24 p[4] 0 0 0
    1 1 0 1 18 p[3] 48 p[4] 0 0 0
    1 1 1 0 24 p[3] 72 p[4] 0 0 0
    1 1 1 1 24 p[4] 0 0 0 0
    Ray Koopman, Aug 16, 2011
  10. Jesse Perla

    Jesse Perla Guest

    Interesting. It appears that we would have to do Bayesian updating at
    every step (with a conjugate prior) in order to deal with the
    sequential nature of this. For me, this isn't a problem as I am
    looking for a recursive filter.

    A few rough notes:
    * I have been playing around with a Beta-binomial prior on 'k' at
    each stage, and it does *appear* to be conjugate to the Bernoulli
    trials (at least before removing the red ball).
    * For example, it looks like we can just increment the alpha or beta
    parameter of the betabinomial prior to get the posterior. We may also
    need to conditionally decrement the 'n' parameter as well. When used
    correctly in bayes rule, it also properly changes the support of the
    posterior to take into account whether there are any black or red
    balls displayed.
    * I haven't figured out the exact transformation after removing the
    red ball, but I think there is a chance that it is simply a question
    of changing which 'k' we are evaluating the posterior at after
    removing the ball
    * i.e. if the prior is \mu(k) and the posterior is \mu'(k), then
    the support of the posterior is {1, ..., N}.
    * Hence to evaluate the posterior before removing a ball we need
    to use k' = k - 1 before putting it into the pdf of the betabinomial.
    * Sorry if this doesn't make any sense... the supports here are
    tricky. See the next statement to isolate this problem from the
    bayesian updating.

    So assuming that a betabinomial is conjugate for the Bernoulli trial
    here, I think we can look at removing the ball as a separate step to
    see if it still is betabinomial. To pose as a question:
    * Assume we have a random variable Y with support = {k=1, ... N}
    * The pdf for Y at k is: PDF[BetaBinomial[alpha, beta, N-1], k - 1]...
    note we when we evaluate the PDF we have to remove 1 to convert to a
    support to {0, ... N-1}
    * I believe the expectation of this is Expec(Y) = (N-1)alpha/(alpha
    + beta) + 1, where the +1 comes from the change in support.
    * Then what happens if we remove a ball? Are we simply looking at the
    pdf of a new random variable: Y' = Y - 1?
    * If so, then is this still BetaBinomial, and what are the parameters
    alpha, beta, n... and what 'k' do we evaluate it at?
    * One guess may be that pdf(Y') = PDF[BetaBinomoial[alpha, beta, N-1],
    k] where we have simply changed which 'k' we evaluate it at.
    * We should be able to check the expectations to see if we are on
    the right track. Note that by linearity of the expectation operator:
    Expec(Y') = Expec(Y) - 1 = (N-1)alpha/(alpha + beta) + 1 - 1 =
    * Hence this guess would at least pass the expectation test.
    Jesse Perla, Aug 17, 2011
  11. Jesse Perla

    Ray Koopman Guest

    Is this the original problem, or are you changing it
    so that there is always at least one red ball initially?
    Ray Koopman, Aug 18, 2011
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.