Solanaceae Gene Expression Database (SGED) FAQ

SGED | SGED FAQ | MIAME | Study Search | Expression Search | Potato Microarrays


1. Where can I find the information on the Expression Profiling Service?

Note: the Expression Profiling Service program has ended and we are no longer accepting any new applications for this service. To view the old application guidelines, visit the retired Expression Profiling Service page.

2. What is a study in SGED?

Generally, a researcher will perform a series of hybridizations and this series makes up one study. A study can have any number of hybridizations.

3. What types of data are in the SGED?

Study Description details - a short description of the hybridizations (organism, tissue, growth conditions, treatment, query, reference), a summary of the sample name abbreviations, and a list of the hybridizations within the study.
Raw and normalized data - all expression data was processed and normalized before loading into the database. All original .gpr (Gene Pix results) files are available as well. In addition, all images (TIF) from the hybridizations are available as well.

4. What does the "ID" column in the .gpr file mean?

The ID column refers to whether the spot has been validated through sequencing, PCR amplification, and agarose gel electrophoresis. ID=0 means the spot has not been validated. ID=1 means the spot is validated.

5. What is a TC?

TC stands for the Tentative Consensus sequences which created by assembling ESTs into virtual transcripts. In some cases, TCs contain full or partial cDNA sequences (ETs) obtained by classical methods. TCs contain information on the source library and abundance of ESTs and in many cases represent full-length transcripts. Alternative splice forms are built into separate TCs.

6. How were the GO and GO Slim IDs assigned?

Potato EST sequences were searched against an Arabidopsis protein database annotated by TIGR and TAIR, using blastx with a stringency cutoff of <1e-10. All unique hits meeting this criterion were selected. GO Slim assignments were made using using the plant GO Slim file downloaded from For the STM clones on the potato array, we selected the GO annotation from the TC or singleton sequence(s) that contain the STM 5' and/or 3' sequence.

7. What is the slide scanning process?

Slides are scanned using an Axon 4000B scanner (Axon Instruments, Union City, CA). Both the 635nm (red, Cy5) and 532nm (green, Cy3) channels are scanned simultaneously at 100% laser power, the PMT (Photo Multiplier Tube) settings are set between 600 and 950 to balance the signal intensities over the two channels as much as possible. Slides are scanned at a resolution of 10micron. Images are saved in a non-compressed TIFF file format for both channels.

8. How are the images quantified?

The TIFF images are quantified using Genepix 5.1 (Axon Instruments, Union City, CA). Both the Cy3 and Cy5 image are analyzed simultaneously. Using a GAL-file (Gene Array List) the grid is overlaid on the image. Initially, print blocks of the array are identified automatically by the software and adjusted manually were needed. After block alignment the features within the blocks are identified automatically by the software. The software automatically flags spots that cannot be found in one of the channels by assigning a flag value of -50. The raw intensities of the quantified image are saved in a gpr file. Using a Perl script, additional spots are flagged that do not meet the following criteria: spots containing > 30% saturated pixels in either channel, spots with a diameter < 70um in either channel and spots that could not be validated during the microarray production process. These spots are all assigned a flag value of -100. The background intensity is calculated by the Genepix software as follows: the median pixel intensity is calculated from a circular region with three times the diameter of the spot, excluding the pixels assigned to neighboring spots. Median background intensity is used to reduce the effect of spurious pixels contributing to the background.

9. What is the background correction and data normalization process?

For background correction and data normalization for each hybridization the raw intensities are loaded into the limma package of Bioconductor ( using the read.maimages function. For the Cy5 channel the F635 Mean column of the gpr-file is used as foreground intensity and the B635 Median column is used as background intensity. For the Cy3 channel the F532 Mean and B532 Median columns are used as foreground and background intensities. Spots with a negative Flag value (Flags column in the gpr file) are assigned a weight of 0 using the function of the limma package. Background subtraction and normalization is performed by the normalizeWithArrays function of limma. Background intensities are subtracted from the foreground intensities and negative values are set to 0. Background corrected intensities are normalized by the printtiploess method using default paramaters. Because the flagged spots are assigned a weight of 0, these spots are excluded from the normalization process (and loess curve fitting). The normalized and background subtracted intensity values are exported from limma both for the red (Cy5) and green (Cy3) channel, the flagged spots are assigned a value of "NA".


Last modified: Monday, 26-Jan-2009 14:33:18 EST