Gene Families

Two methods were used to cluster proteins into families. The gene set used does not contain the genes classified as Transposable Elements (TEs).

TribeMCL clusters families by first running an all-versus-all BLAST search and parsing the output through matrices to generate the families. The ten largest families include:

Family ID

Num. Family Members

Name

1

861

Receptor-like protein kinase

2

840

Disease resistance-like protein

3

732

Unknown protein/Cytochrome P450

4

564

Leucine Rich Repeat family protein

5

335

Helicase-like protein

6

331

Cytochrome P450

7

325

NBS-LRR disease resistance protein

8

317

Pentatricopeptide repeat-containing protein

9

283

Pentatricopeptide repeat-containing protein

10

237

1-aminocyclopropane-1-carboxylate oxidase

The JCVI Paralogous Families pipeline clusters proteins into families based on domain composition. Domains are first identified by HMM search, and then by BLAST homology. The family members all contain the same domain architecture.

Family ID

Num. Family Members

Name

1

562

Unknown protein

2

304

Receptor-like protein kinase

3

247

Unknown protein

4

168

F-box/kelch-repeat protein

5

155

Helicase-like protein

6

150

Unknown protein

7

150

CCP

8

132

Unknown protein

9

128

Pentatricopeptide repeat-containing protein

10

121

Pentatricopeptide repeat-containing protein