Medicago truncatula re-assembly using Optical Map

Pseudomolecule assembly was greatly aided by the construction of an optical map by David Schwartz and collueages at Laboratory for Molecular and Computational Genomics at the University of Wisconsin, Madison.

Pseudomolecule re-assembly procedure

Mt3.0 pseudomolecules were rearranged to conform as closely as possible to the optical map. This process began by incorporating updates to existing BAC sequences and addition of newly sequenced BACs that had occurred since the initial data freeze, using the MUMmer approach. Resulting pseudomolecule sequences were aligned with the optical map contigs with the goal of eliminating or minimizing the intersection of the lines defining the alignment block.

The re-assembly process can be outlined as follows: (repeat until no more moves possible):

  • Identify contigs that need to be moved or re-oriented
  • Do not break robust sequence contigs (but check dubious overlaps)
  • Do not break robust scaffolds (but check dubious BAC end joins)
  • Generate new pseudomolecule from new AGP file
  • Re-align reconstructed pseudomolecule with optical map

The final build of chromosome 3 was achieved at version 3.5, while the other seven chromosomes underwent one further round of adjustments resulting in the final version 3.5.1.

Alignments with the Optical Map contigs

The images below illustrate the alignments of the final (corrected) pseudomolecules to the optical map contigs.

To download the XML files, which are viewable with OpGen Viewer, visit our FTP site.

True gap-size estimation

Alignments between the optical map and final pseudomolecules also permitted estimation of true gap sizes based upon the total size of unaligned optical map restriction fragments covering the gap and these are included in the AGP file. Following plot shows the distribution of gap sizes across all 8 pseudomolecules:

Vector/contamintation screening and removal

Residual vector contaminants in the pseudomolecules detected during the GenBank submission process were hard-masked (replaced with Ns) with their sequence ranges flagged as gaps in the tiling path. In total, identified contaminants in 119 BACs and replaced 77,167 bases with Ns.

Optical mapping system workflow:

To create the Medicago truncatula optical map, approximately 1 million single DNA molecule restriction maps were generated with a total mass of ~384 Gb, which is ~768x coverage of the Mt genome. Divide and conquer, an iterative assembly strategy previously developed for rice and maize optical map assemblies, was used to assemble this single molecule data set into 26 optical map contigs ranging from 958 kb to 38.6 Mb in size and spanning 378 Mb; the 25 largest contigs could be aligned to the sequences in the pseudomolecules.