# Help with conjugate priors/Bayesian problem with a repeated urn

Discussion in 'Probability' started by Jesse Perla, Aug 15, 2011.

1. ### Jesse PerlaGuest

Here is the problem that has crept up in my research, written as an
urn problem:

1) I have an urn with N total balls and K red balls
2) K is unknown, N is known
3) I set a prior on K (hopefully a conjugate prior) or potentially on
K/N, the probability to draw a red ball...
4) There is a repeated bernoulli draw from the urn.
4.a) If the ball was red, I take it out and replace it with a black
ball
4.b) If the ball was black, I put it back in the urn.
4.c) Regardless, I update to a posterior on K based on the success
of the draw.
4.d) But note that I have taken a red ball out and replaced it by a
black ball, so I somehow need to adjust my prior! This may be the
tricky part
5) This process is repeated, potentially until all of the red balls
are gone. The key variable of interest is to keep track of the prior
on the number of red balls left in the urn.
--------------------------------------------------------------------------------
Written sequentially, I think the state here can be summarized as
(note that N is fixed throughout):
1) `t', the number of trials that have occurred.
2) 'k_t' the number of red balls that I currently have at time 't'
3) 'mu_t', the current prior distribution on either K or K/N.
Hopefully this is a conjugate prior to the bernoulli trial. Lets say
that it is a two parameter prior, 'alpha_t' and 'beta_t'

Written recursively (which I would strongly prefer), the recursive
state would hopefully just be (given a fixed N):
1) k, the current stock of red balls
2) mu, the prior, hopefully collapsed into parameters 'alpha' and
'beta'.
..... I hope/think it is possible that this is a sufficient recursive
state since the prior may contain the sufficient information of the
number of trials (e.g. 'alpha + beta' for the Beta distribution)

For notation: Let 'x \in {0,1}' be the success of the Bernoulli trial
where x = 1 if red.

So the question is: what conjugate prior can I use and what is the
updating procedure to get the posterior alpha' and beta' from alpha,
beta, N, k, and x. (given that a prime denotes the posterior or next
period):
---------------------------------------------------------------------
If we did NOT have the replacement of the black ball whenever a red
ball was taken out, I believe the following fairly standard approach
would work:
1) Set the prior as the beta distribution with parameters alpha and
beta.
2) Do a repeated sampling with the Bernoulli trial.
3) Update to the posterior, which works since the conjugate prior of
the Bernoulli is the Beta.
4) The recursive updating formula would be:

k' = k + x
alpha' = alpha + x
beta' = beta + 1 - x
where the prior at any point is distributed Beta(alpha, beta)
-------------------------------------------------------
So to add in the complication of taking out the red ball and replacing
it with a black one, it seems to me that we need to do a
transformation of the prior.
Here is one approach (that doesn't seem to work):
0) Lets say that the conjugate prior is still Beta(alpha, beta).
What exactly is this? It is a prior on the probability of drawing a
red ball. i.e. K/N ~ Beta(alpha, beta)
1) If a black ball is drawn, there is no difficulty. alpha' = alpha,
beta' = beta + 1
2) If a red ball is drawn...
2.a) First update to the posterior from the previous urn. alpha' =
alpha + 1, beta' = beta. i.e. Posterior K/N ~ Beta(alpha + 1,
beta). Let this be a random variable called X
2.b) Next, do a transformation of the random variable. Here we
need to use the N since the size of the urn matters.
Aren't we looking for what the prior on (K - 1)/N is? If so, then
isn't this just K/N - 1/N = X - 1/N where X ~ Beta(alpha + 1, beta)
.... but this new distribution sure doesn't look beta to me, or at
least I couldn't figure it out. My apologies if I have done something
stupid here.

Also, is there a problem with discreteness here when using the beta
distribution? Note that if there are 3 balls in the urn, and a red
ball is drawn, should the support of the posterior include p < 1/3?
It seems to me that a discrete conjugate prior on K directly makes
more sense if such things exist?

I thought there was a chance that I could look at this problem
sequentially as a hyper-geometric problem combining all of the samples
up to the current period, but the problem is that hypergeometric draws
take out the red ball, but don't replace it with a black ball... And
the beta-binomial urn problem involves adding in 2 red balls if one is
observed...

------------------------------------------------------

Does anyone have any bright ideas on this? Is this a standard
problem, documented somewhere? Is there a key reference on this sort
of stuff for discrete data?

Jesse

Jesse Perla, Aug 15, 2011

2. ### Ken ButlerGuest

One way of proceeding in a sequential process like this is to
proceed one ball (step) at a time, and use the posterior from the previous
step as the prior for the next step. After all, the posterior encapsulates
all your knowledge based on the experiment so far, so you'd want to feed
that back in as your prior (pre-experiment beliefs plus experimental
evidence so far) for the next stage. The quantity you are estimating at
each step is "the probability of drawing a red ball on *this* step".

If you knew how many red balls there were originally, this would be a
Markov chain.

Ken Butler, Aug 15, 2011

3. ### Ray KoopmanGuest

All you get from t trials is the knowledge that the
urn originally contained at least k_t red balls and
now contains at least t black balls. In particular,
the probability distribution of the number of red
balls remaining in the urn is just the prior,
truncated at n-t and rescaled to sum to 1.

Let p[t,k] denote the probability that k red balls
remain in the urn after t trials. Your prior is p[0,k].

Then for t = 1,...,n, the updated distribution is
p[t,k] = p[t-1,k]/(1-p[t-1,n-t+1]), k = 1,...,n-t ;
= 0 , k = n-t+1,...,n.

After t trials, the posterior for the number of red balls
that were initially in the urn is P[t,k] = p[t,k-k_t];
i.e., the updated prior, shifted up k_t balls.

Ray Koopman, Aug 15, 2011
4. ### Ray KoopmanGuest

That should be...................... k = 0,...,n-t

Ray Koopman, Aug 15, 2011
5. ### Jesse PerlaGuest

Absolutely. I guess my problem is that I need to have a conjugate
prior so the posterior at each step is tractable. Any ideas on which
one might work?

Jesse Perla, Aug 15, 2011
6. ### Jesse PerlaGuest

Thanks,
Is this true? For example, if I had a prior on the number of red
balls and I got a long sequence of black balls with no red balls,
wouldn't my posterior start to converge to put less and less weight on
higher numbers of red balls?

Jesse Perla, Aug 15, 2011
7. ### Ray KoopmanGuest

You're right. After a single bernoulli draw,
the posterior probability of k reds is proportional to

p'[k] = p[k] * (k/n)^x * (1-k/n)^(1-x),

where p[k] is the prior probability, and
x = 1 for red and 0 for black. Note that
p'[0] = 0 if x = 1, and p'[n] = 0 if x = 0.

Then adjust for replacement: p"[k] = p'[k + x].

Finally, normalize p" to get the p for
the number of reds remaining in the urn.

Ray Koopman, Aug 15, 2011
8. ### Jesse PerlaGuest

Thanks. Makes complete sense.

* I guess my biggest problem here is that I kind of need a conjugate
prior to kick things off p_0[k] such that the parameterization of p_0
is relatively small (say < 5 parameters would be computationally
tractable for me) and most importantly is independent of n.
* Why? My problem is essentially a recursive Bayesian learning problem
written in dynamic programming (where n might be fairly large), so it
is intractable for the prior to be outside of a finite parameter
family as otherwise the dynamic state suffers from the curse of
dimensionality as 'n' increases.
* To start simple:
* Lets say there was only 1 draw and we didn't have to remove the
red at the end. Is there a discrete valued prior on 'k' that is
for Bernoulli, they always talk about the Beta, which I am pretty sure
won't work for this problem due to the inherent discreteness here.
* Then, with this prior is there a way that the 'adjustment' of
taking out the red ball keeps it in the same family?
* If those two are successful, then we have our conjugate prior.

Jesse Perla, Aug 16, 2011
9. ### Ray KoopmanGuest

With replacement, but without changing red to black,
the unnormalized posterior after t trials would be

p_t[k] = p_0[k] * k^k_t * (n-k)^(t - k_t)

If p_0[k] is proportional to k^a (n-k)^b then

p_t[k] = k^(a + k_t) * (n-k)^(b + t - k_t),

which (as you noted) is an integer version of a beta dsitribution.

However, changing red to black makes the posterior depend on the
sequence of draws, not just on k_t. For n = 4 and 4 trials, here is
the prior and the 16 possible draw sequences with their corresponding
unnormalized posteriors. All the posteriors are distinct.

p[0] p[1] p[2] p[3] p[4]

0 0 0 0 256 p[0] 81 p[1] 16 p[2] p[3] 0
0 0 0 1 27 p[1] 16 p[2] 3 p[3] 0 0
0 0 1 0 36 p[1] 24 p[2] 6 p[3] 0 0
0 1 0 0 48 p[1] 36 p[2] 12 p[3] 0 0
1 0 0 0 64 p[1] 54 p[2] 24 p[3] 4 p[4] 0
0 0 1 1 8 p[2] 6 p[3] 0 0 0
0 1 0 1 12 p[2] 12 p[3] 0 0 0
0 1 1 0 16 p[2] 18 p[3] 0 0 0
1 0 0 1 18 p[2] 24 p[3] 12 p[4] 0 0
1 0 1 0 24 p[2] 36 p[3] 24 p[4] 0 0
1 1 0 0 32 p[2] 54 p[3] 48 p[4] 0 0
0 1 1 1 6 p[3] 0 0 0 0
1 0 1 1 12 p[3] 24 p[4] 0 0 0
1 1 0 1 18 p[3] 48 p[4] 0 0 0
1 1 1 0 24 p[3] 72 p[4] 0 0 0
1 1 1 1 24 p[4] 0 0 0 0

Ray Koopman, Aug 16, 2011
10. ### Jesse PerlaGuest

Interesting. It appears that we would have to do Bayesian updating at
every step (with a conjugate prior) in order to deal with the
sequential nature of this. For me, this isn't a problem as I am
looking for a recursive filter.

A few rough notes:
* I have been playing around with a Beta-binomial prior on 'k' at
each stage, and it does *appear* to be conjugate to the Bernoulli
trials (at least before removing the red ball).
* For example, it looks like we can just increment the alpha or beta
parameter of the betabinomial prior to get the posterior. We may also
need to conditionally decrement the 'n' parameter as well. When used
correctly in bayes rule, it also properly changes the support of the
posterior to take into account whether there are any black or red
balls displayed.
* I haven't figured out the exact transformation after removing the
red ball, but I think there is a chance that it is simply a question
of changing which 'k' we are evaluating the posterior at after
removing the ball
* i.e. if the prior is \mu(k) and the posterior is \mu'(k), then
the support of the posterior is {1, ..., N}.
* Hence to evaluate the posterior before removing a ball we need
to use k' = k - 1 before putting it into the pdf of the betabinomial.
* Sorry if this doesn't make any sense... the supports here are
tricky. See the next statement to isolate this problem from the
bayesian updating.

So assuming that a betabinomial is conjugate for the Bernoulli trial
here, I think we can look at removing the ball as a separate step to
see if it still is betabinomial. To pose as a question:
* Assume we have a random variable Y with support = {k=1, ... N}
* The pdf for Y at k is: PDF[BetaBinomial[alpha, beta, N-1], k - 1]...
note we when we evaluate the PDF we have to remove 1 to convert to a
support to {0, ... N-1}
* I believe the expectation of this is Expec(Y) = (N-1)alpha/(alpha
+ beta) + 1, where the +1 comes from the change in support.
* Then what happens if we remove a ball? Are we simply looking at the
pdf of a new random variable: Y' = Y - 1?
* If so, then is this still BetaBinomial, and what are the parameters
alpha, beta, n... and what 'k' do we evaluate it at?
* One guess may be that pdf(Y') = PDF[BetaBinomoial[alpha, beta, N-1],
k] where we have simply changed which 'k' we evaluate it at.
* We should be able to check the expectations to see if we are on
the right track. Note that by linearity of the expectation operator:
Expec(Y') = Expec(Y) - 1 = (N-1)alpha/(alpha + beta) + 1 - 1 =
Expec(BetaBinomial[alpha,beta,N-1])
* Hence this guess would at least pass the expectation test.

Jesse Perla, Aug 17, 2011
11. ### Ray KoopmanGuest

Is this the original problem, or are you changing it
so that there is always at least one red ball initially?

Ray Koopman, Aug 18, 2011