Title: Density Modeling and Clustering Using Dirichlet Diffusion Trees
I introduce a family of prior distributions over multivariate distributions, based on the use of a `Dirichlet diffusion tree' to generate exchangeable data sets. These priors can be viewed as generalizations of Dirichlet processes and of Dirichlet process mixtures, but unlike simple mixtures, they can capture the hierarchical structure present in many distributions, by means of the latent diffusion tree underlying the data. This latent tree also provides a hierarchical clustering of the data, which, unlike ad hoc clustering methods, comes with probabilistic indications of uncertainty. The relevance of each variable to the clustering can also be determined. Although Dirichlet diffusion trees are defined in terms of a continuous-time process, posterior inference involves only finite-dimensional quantities, allowing computation to be performed by reasonably efficient Markov chain Monte Carlo methods. The methods are demonstrated on problems of modeling a two-dimensional density and of clustering gene expression data.