Viral Ortholog Clustering


Ortholog Clustering

A modification to the OrthoMCL algorithm improves the performance of ortholog clustering for viral proteins.

Polyproteins and mature peptides may be clustered in the same group

BLAST, a local alignment method, generates incorrect ortholog groups in some cases, in particular when a long polyprotein overlaps with many mature peptides as shown in the illustration at the top.

Unequal Lengths

The additional length does not decrease the BLAST score, so the mature peptides form ortholog groups with the polyproteins and therefore with each other.

We have modified the OrthoMCL  to determine the subject and query gene similarity by taking the ratio of the gene lengths into account.

Evaluating Results

The modification fixes the problem illustrated. To demonstrate this, we have developed two new metrics that utilize external information, in particular the virus strain and the gene name, to quantitatively measure the improvement in the performance of the clustering algorithms.





National Institute of Allergy and Infectious Disease (NIAID)