Clustering next-generation gene expression data
is a program for clustering digital gene expression data, such as RNA-seq, CAGE, etc. It takes as input a table of tag counts and it estimates the number and characteristics of the clusters supported by the data. This is achieved
using a Hierarchical Dirichlet Process Mixture Model built around the Negative Binomial Distribution for modelling over-dispersed count data, combined with a blocked Gibbs sampler for efficient Bayesian learning in said model.