Dr. Shelley Bull

Senior Investigator

Samuel Lunenfeld Research Institute

Title: " Regression models, scan statistics and reappearance probabilities: Detection of regions of association between microarray measures of gene expression and copy number "

The literature on methods to detect differentially expressed genes or DNA copy number alterations in investigations of cancer biology and outcome using microarrays is extensive, but few papers consider methods for joint analysis of the two quantities and covariate information is generally underutilized. Our interest in methods for joint analysis is motivated by a prospective study of molecular alterations in tumours from patients with axillary node-negative (ANN) breast cancer. As part of this on-going study, cDNA microarrays were used to determine relative gene expression in a subsample of ANN tumour RNAs and to assess relative copy number by performing array comparative genomic hybridization (aCGH) of DNA from the same tumours. The tumours arrayed in these microarray studies were chosen to facilitate comparisons according to certain prognostic characteristics and clinical outcomes.

Early studies based on microarray data used linear models to quantify the relationship between measures of gene expression (GE) and copy number (CN) obtained from tumour samples. We propose a regression-based scan statistic to identify within-chromosome clusters of genetic markers that exhibit association between GE and CN, while accounting for explanatory covariates such as tumour characteristics known to be prognostic for clinical outcome. As a measure of the association between GE and CN, we regress GE on CN at each genetic marker, and include subject-specific covariates. In the development of the scan statistic, the distribution of the subset of markers with a statistically significant association is approximated by a Poisson process. By incorporating the distance between the markers, the scan statistic accounts for the spatial nature of CN alterations. Regions identified as clusters of significant associations are hypothesized to harbour genes involved in breast cancer progression. Using simulations, we examine the sensitivity of the method to certain factors, and to address issues of repeatability, we consider marker reappearance probabilities within detected regions and assess the utility of a quantity estimated by bootstrap sample frequencies. Applications of the proposed method to joint analysis of GE and CN, with and without an informative covariate, and comparisons with alternative methods suggest that inclusion of covariates and the use of a regional test statistic can serve to refine regions for further investigation.

This is joint work with Jennifer L. Asimit and Irene L. Andrulis.