Table of Contents


Blood Pressure


Last modified 2003-03-02 22:30


Please check this page regularly for updates, corrections, and answers to frequently-asked questions!


Our appreciation goes out to Dr. Raymond Lam, GlaxoSmithKline, Toronto, Ontario, Canada for providing this case study.



Genes contribute to the development and progression of disease and they also influence how individuals respond to medicines. At GlaxoSmithKline (GSK), we are conducting genetic and genomic research which will allow the medical community to accurately prescribe the right medicine for the right patient. 

In genetics research studies often hundreds to thousands of genetic markers, together with many clinical measurements, are collected.  Statistical tools are useful for separating 'true' genes from 'false' alarms.

Data Description 
The data file (ascii file, comma delimited data file) contains 500 observations (subjects) and 501 variables.  Of the 500 subjects, 250 had low blood pressure and 250 had high blood pressure (i.e. hypertension).  The 501 variables consist of one response variable (systolic blood pressure) and 500 predictors (17 clinical covariates and 483 genetic markers).  These variables are described below.

The attributes (variables) in this study are:



SysSystolic Blood Pressure (SBP)

Continuous response variable


Binary Variable:

M = Male, F = Female

Marital Status

Binary variable:

Y = Married, N = Not Married

Smoking Status

Binary variable:

Y = Smoker, N = Non-Smoker


Continuous variable (years)


Continuous variable (lbs)


Continuous variable (inches)

Body Mass Index (BMI)

Continuous variable:

Weight / Height2 *703


Categorical variable:

1 = Normal, 2 = Overweight, 3 = Obese.


Categorical variable taking values 1, 2, 3, or 4.

Exercise level

Categorical variable:

1 = Low, 2 = Medium, 3 = High

Alcohol Use

Categorical variable:

1 = Low, 2 = Medium, 3 = High

Stress Level

Categorical variable:

1 = Low, 2 = Medium, 3 = High

Salt (NaCl) Intake Level

Categorical variable:

1 = Low, 2 = Medium, 3 = High

Childbearing Potential

Categorical variable:

1 = Male, 2 = Able Female, 3 = Unable Female

Income Level

Categorical Variable:

1 = Low, 2 = Medium, 3 = High

Education Level

Categorical Variable:

1 = Low, 2 = Medium, 3 = High

Treatment (for hypertension)

Binary Variable:

Y = Treated, N = Untreated

483 Genetic Markers

0_0, 0_1, 1_1



For this case study, a genetic data set is generated based on a complex genetic model we developed at GSK.  There are 500 predictors (483 genetic markers and 17 clinical covariates).  The goal is to identify the 'true' predictors among the 500 variables and, at the same time, control the false discovery rate.  Therefore, the objectives are:


                  1.     Identify 'true' genes and clinical covariates; and

2.     Control False Discovery (number of true X's versus number of false X's identified).

Frequently Asked Questions

Please check this section regularly for updates.