
Apply Gibbs sampling to acquire overhypotheses about the feature variability for the bags of marbles model illustrated in Fig. 8: Suppose that is a stack containing many bags of marbles. We empty several bags and discover that the marbles within the same bag have certain features in common: For instance some bags may contain black marbles, others may contain white marbles, but that the marbles in each bag are uniform in color. Given a new bag  bag  and a single marble (e.g. a black marble) drawn from this bag we are interested in the probability of the colors of all other marbles within this bag. On its own, a single draw would provide little information about the contents of the new bag, but experience with previous bags may lead us to endorse certain hypothesis (e.g. all marbles in a bag have uniform colors).
Learning overhypothesis: The term overhypothesis is used to refer to any form of abstract knowledge that sets up a hypothesis space at a less abstract level. By this criterion, an overhypothesis sets up a space of hypotheses about the marbles in bag : they could be uniformly black, uniformly white, and so on. Hierarchical Bayesian models capture the notion of overhypothesis by allowing hypothesis spaces at several levels of abstraction. In this example we wish to explain how a certain kind of inference can be drawn from a given body of data. In this case, the data are observations of several bags and we are working with a set of 2 colors.
Bags of marbles model: Let indicate a set of observations of the marbles in bag . If we have drawn five marbles from bag 7 and all but one are black, then . We are interested in the ability to predict the color of the next marble to be drawn from bag . The first step is to identify a kind of knowledge (level 1 knowledge) that explains the data and that supports the ability of interest. In this case, level 1 knowledge is knowledge about the color distribution of each bag. Let indicate the true color distribution for the th bag in the stack. We assume that is drawn from a binomial distribution with parameter : in other words, the marbles responsible for the observations in are drawn independently at random from the th bag, and the color of each depends on the color distribution for that bag. If 60% of the marbles in bag 7 are black, then .
For the marbles scenario, level 2 knowledge is knowledge about the distribution of the variables. This knowledge is represented using two parameters, and . The vectors are drawn from a Beta distribution parameterized by a scalar and a vector with . The parameter determines the extent to which the colors in each bag tend to be uniform, and represents the distribution of colors across the entire collection of bags. We need to formalize our a priori expectations about the values of these variables.
Level 2 knowledge is acquired by relying on a body of knowledge at an even higher level, level 3. We use a uniform distribution on and an exponential distribution on , which captures a weak prior expectation that the marbles in any bag will tend to be uniform in color. The mean of the exponential distribution is , i.e. . The parameter and the pair are both overhypotheses, since each sets up a hypothesis space at the next level down. Since the level 3 knowledge is specified in advance ( ), you should analyze how an overhypothesis can be learned at level 2.
The joint probability distribution for this model is therefore given by
(1) 
(2)  
(3)  
(4)  
(5) 