Services Header

analysis & bioinformatics
home services analysis & bioinformatics

The Microarray Centre also offers analysis and bioinformatics-related services. Our Bioinformatics Team provides many services, such as discussion and analysis of your array-based project and experimental design assessments, complementary to UHNMAC customers and UHN researchers.

Data analysis for microarray projects

A basic data analysis package is available for on a per project basis. Our data analysis service is also available to customers with microarray data obtained from other facilities. The basic data analysis package includes:

  • consultation on experimental design (complimentary)
  • extensive quality control, filtering, and normalisation
  • statistical analyses such as clustering and T-tests/ANOVA using R/Bioconductor and GeneSpring software packages
  • data available via secure online User Portal

Pricing for advanced data analysis is also quoted on a per project basis. Advanced data analysis can include:

  • gene ontology
  • pathway analysis
  • literature searches
Other bioinformatics services available:
  • sequence and SNP analysis from raw chromatograms
  • BLAST searches
  • multiple alignments and trees
  • primer design
  • antibody epitope prediction
  • computer programming

We have also put together some information that you may find helpful. If you have any suggestions or additions, please contact us at and let us know!

Quick questions:
Expand all Hide all

How many replicates are needed for a microarray experiment?

As with any science experiment, the reliability of your data increases with the number of replicates performed. Of course, if you are using a limited source of RNA, the number of replicates you are capable of doing will also be limited. Most groups strive to have at least 3 replicates to allow for statistical analysis. For UHNMAC Service packages that include data analysis, a minimum of 3 replicates are required.

I've done my microarray experiment- the Cy5/Cy3 ratios are all between 0.5 and 2. What did I do wrong?

It is possible that you did nothing wrong! When comparing two RNA samples, the majority of genes will not be differentially expressed and thus the majority will have ratios around 1. If you are sure that at least a few genes should be differentially expressed, you may have to repeat the experiment (and do a reciprocal labelling!) to verify the results.

What is pre-processing?

Pre-processing is a step that extracts or enhances meaningful data characteristics and is often performed prior to analyzing or “processing” the data. General pre-processing techniques include log transformations, combining replicates, eliminating outliers, use of control spots, and normalisation.

What is normalisation? What is the difference between global and sub-grid normalisation?

Normalisation means to adjust microarray data to account for systemic differences across data sets. Most often, normalisation is used to account for the different dye efficiency in a two-colour experiment.

Global normalisation takes into account all areas of the array during normalisation. Significant local effects can heavily influence this method. Sub-grid normalisation calculates the normalisation factor for each sub-grid independently, thus making this method insensitive to local variations on the array.

Do you use housekeeping genes for normalisation?

Due to the controversy over what constitutes a housekeeping gene (for a given organism, tissue, condition, etc), we tend not to use housekeeping genes for normalisation.

What is LOWESS?

LOWESS, also known as LOESS, stands for LOcally WEighted polynomial regreSSion. The general idea for this kind of normalisation is to fit a mathematical function through the data and obtain a model of the distortion, and then use this model to adjust the data. The LOWESS function is a curve-fitting equation. It performs a local fit to the data in an intensity-dependent manner. The intensity value for each spot is normalised based on data distribution in the immediate neighbourhood of the spot’s intensity.

Why log ratios? Why log base 2?

The logarithmic transformation provides values that are more easily interpretable and more biologically meaningful. It is convenient to log transform numbers in order to eliminate misleading disproportion between two relative changes. For example, assume two spots both have intensity values of 1000 in the control sample, and values of 100 and 10,000 in the treated sample. The absolute difference between the control and treated samples is 900 and 9000, respectively, for the two spots. However, from a biological point of view, the phenomenon is the same, a 10-fold change in both genes (10-fold increase for one gene, and a 10-fold decrease for the other gene). By using log transformation, fold changes happening around small intensity values will be comparable to fold changes happening around large intensity values. In this example, one gene has a fold change of 1 and the other of –1.

Log base 2 has the advantage of producing a continuous spectrum of values and treating up- and down-regulated genes in a similar way. A gene up-regulated by a factor of 2 has a log2 (ratio) of 1, a gene down-regulated by a factor of 2 has a log2 (ratio) of –1 and a gene with no change in expression (ratio of 1) has a log2 (ratio) equal to zero. The log base 2 transformations are convenient and make further analysis and data interpretation easier.

What is one-way ANOVA? Two-way ANOVA?

ANOVA stands for Analysis of Variance. The idea behind ANOVA is to study the relationship between the inter-group and within-group variabilities. One-way ANOVA investigates the data by only considering one factor, or in other words, considers only one way of partitioning the data into groups. Two-way ANOVA considers that the data can be grouped by at least two factors.

What is an M versus A plot?

M is defined as log2 (LexE/LexR) and the formula for A is (log2 (LexE*LexR/))/2. This plot, as opposed to a log vs log plot, allows for the rapid identification of skewed data. Data points in a perfectly normalised data set will be centered on the M=0 axis.

What do you do with saturated spots?

In a 16-bit tiff file, genes with an intensity value of 65,536 are considered saturated. The true intensity of these spots is actually unknown and those spots are flagged and often excluded from further analysis.

What are distance metrics? (Euclidean, manhattan)

Distance metrics, also known as similiarity metrics, are a function that takes two points (x and y) in an n-dimensional space and has the following properties: symmetry, positivity, and triangle inequality. A Euclidean distance is the simplest (shortest) distance between x and y, while the Manhattan (city block) distance is one in which movement can only be in parallel with the x or y axis.

What is PCA?

Principal Component Analysis (PCA) is a numerical procedure carried out to discover or reduce dimensionality of the data set, identify new meaningful underlying variables, and to magnify the trends in data (increasing separation of poorly correlated elements and bringing highly correlated elements closer together). PCA rotates the data space, aligning the directions of the greatest variability in the data (the first and second principal component) with the x and y axes of the scatter plot.

How can I validate my microarray results?

Validation is most often performed by real-time (quantitative) PCR. Validation can also be performed using nanostrings, Bio-Plex assays, and the Ziplex platform. More information about data validation services offered at the UHNMAC can be found here.

What is MIAME?

Minimum Information About Microarray Experiments (MIAME) is a set of guidelines that outlines the minimum information required to interpret unambiguously, and possibly verify, microarray experiments. Visit the Microarray Gene Expression Data Society (MGED) for more information. Brazma, A, et al. Minimum information about a microarray experiment (MIAME), towards standards for microarray data. Nature Genetics, 2001, 30(4):e15

What is gene ontology?

Gene Ontology (GO) is a controlled vocabulary to describe gene and gene product attributes of any organism. The GO project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The three organising principles of GO are molecular function, biological process and cellular component.