Coordinate Systems (Except those Specified in the Extract_Coding_Info.pl paramaters)

All coordinates are defined in local 0-space-based coordinates. What this means is that the 0-1 refers to the first base in the associated sequence you are providing annotation on. 0-space-based coordinates are good for specifying insertions and deletions, since there is no ambiguity for which bases were replaced by the mutation.

0-space-based coordinate system:

0 1 2 3 4

A T G C

Using the notation for a variation, (<begin>, <end>, <reference>/<mutatant>), an insertion of a C between T and G, would be represented as (2, 2, -/C). A substitution of TG with a AT would be represented as (1, 3, TG/AT). A deletion of a TG would be represented as (1, 3, TG/-).

Incidentally, Ensembl uses the 1-residue-based coordinate system. When referencing Ensembl coordinates, the coordinate system will be stated as so, as a reminder:

1 2 3 4

A T G C

Positions need to be specified in local coordinates, which means that unless your region of interest starts at the beginning of the genome, there will be a offset between your region of interest and the complete reference genome. So if you are interested in region 1000-10000, and your variation is at position 1050-1051 globally, the local coordinate system will place the position of your variation in position 50-51. If the region of interest is the entire genome, then your local coordinate system is the same as your global coordinate system.

In this example below, the length of the genome is 10 bases. The region of interest is from 3-8 in global coordinates. The G is in position 5-6. In local coordinates, the G is between 2-3.

GLOBAL: 0 1 2 3 4 5 6 7 8 9

T T T A T **G** C T T

LOCAL: 0 1 2 3 4 5