The SUPERFAMILY database contains assignments of protein domains of known structures to all completely sequenced genomes. The library of hidden Markov models and the software required to run the SUPERFAMILY assignment procedure is available from the downloads page. There is a very low traffic mailing list for notification of updates/changes.

Domain-centric Gene Ontology

dcGO is a domain-centric solution to function prediction and functional genomics. Gene Ontology terms have been mapped to SUPERFAMILY domains, supra-domains and architectures. The method has been assessed for its ability to predict GO terms on sequences in the CAFA competition. This dcGO also contains Phenotype Ontology terms and Anatomy ontology terms mapped from a dozen or so ontologies. To benefit wider bioinformatics community, an open-source package dcGOR is also developed under the R software environment.

Database of disordered protein predictions

D2P2 A community resource for pre-computed disorder predictions over a large library of known amino sequence. Goals of the database include making statistical comparisons of the various prediction methods freely available to the prediction community. As well as facilitating biological investigation of the disordered protein space.

SNP/SNV functional and phenotypic analysis

The Functional Analysis Through Hidden Markov Mod els (FATHMM) software and server is capable of predicting the functional and phenotypic con sequences of protein missense variants using hidden Markov models (HMMs) representing the alignment of homologous sequences and conserved protein domains.

Coiled Coil prediction in sequences

Spiricoil is a website for Coiled Coil prediction in sequences. It also includes the option to view detected coiled coils mapped on known structures and even generate 3D models and view them. Spiricoil also includes the assignment of Coiled Coils to all completely sequenced genomes.

Vector Graphics tree plotting

TreeVector is a utility to create and integrate phylogenetic trees as Scalable Vector Graphics (SVG) files. One of the main purposes of TreeVector is to move away from treating phylogenetic trees as end end point and final graphic, and to instead embed them in dynamic processes using web standard technologies, so that quick reference of a particular pattern or trait is possible, dynamic and up to date.

HMM profile-profile comparison

This was the work of Martin Madera whilst in the group. He wrote PRC the PRofile Comparer. It is a stand-alone program for aligning and scoring two profile hidden Markov models, but can also handle PSI-BLAST profiles.

Clustering next-generation gene expression data

DGEclust is a program for clustering digital gene expression data, such as RNA-seq, CAGE, etc. It takes as input a table of tag counts and it estimates the number and characteristics of the clusters supported by the data. This is achieved using a Hierarchical Dirichlet Process Mixture Model built around the Negative Binomial Distribution for modelling over-dispersed count data, combined with a blocked Gibbs sampler for efficient Bayesian learning in said model.

A supra-hexagonal map for omics data analysis

supraHex is an R/Bioconductor package that intends to train, analyse and visualise tabular omics data. With supraHex, users can easily and intuitively carry out integrated tasks such as: simultaneous analysis of gene clustering and sample correlation, and the overlaying of additional data (if any) onto the trained map for multilayer omics data comparisons.

Dynamic networks via integrative analysis

dnet is an R package that provides integrative analysis of network, expression, evolution and ontology data: i) identification of expression-active dynamic subnetworks; ii) network-based sample stratifications and visualisations on 2D landscape; and iii) enrichment analysis using phylostratific age information and using a variety of ontologies (respecting direct acyclic graph) in many common organisms.

A Proteome Quality Index

PQI PQI is measure of proteome quality available from a comprehensive database of downloadable proteomes. Completely sequenced genomes for which there is an available set of protein sequences (the proteome) are given a 5-star rating supported by 11 different metrics of quality. PQI is a constantly updated web resource that currently includes over 3,200 annotated proteomes from multiple providers including all entries from NCBI and ENSEMBL, powered by the SUPERFAMILY Database.