Fall 2001

Math 3330 3.0 BF: Regression Analysis

Reload this page to see the latest changes
Last update: November 5, 2001
by Georges Monette
There are three kinds of lies: lies, damned lies and statistics. Benjamin Disraeli
Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. H.G. Wells

Contents:

Current News

General information

Computing

Getting Started in the Gauss Lab (S110 Ross)
If you used the Gauss Lab (now formally known as the Arts Multimedia Lab) in a previous course, you can continue to use the same account.  If you haven't used the Lab before you need to do two things:
  1. Get door access to the Lab
  2. Get a computer account for the Lab
Refer to the orange handout entitled 'AML WELCOME KIT -- Gauss Edition' for details.
Getting Started with S-Plus in the Gauss Lab
There are two web pages with instructions on how to get started: Lesson 1 and Lesson 2 of Getting Started with S-Plus.
After completing these two lessons, you should be ready to work your way through an on-line tutorial originally written by Annie Dupuis at Dalhousie University:  http://www.utstat.toronto.edu/splus/contents.html . Expect to take about 10 hours to work your way through this tutorial. It would be a good idea for you to complete the tutorial by the end of September.

Course Work

  1. Assignments: (15%) 4 assignments.
  2. Quizzes: (20%) Best 4 out 5 15-minute quizzes
  3. Mid-term test: (25%) A 1-hour mid-term test
  4. Data analysis: (10%) Requiring a written report of an analysis of a suitable data set selected by you.
  5. Final exam: 30%

Schedule

Note that the topics might change as the course progresses.
 
Week
Schedule
Week 1
September 10
Topics:
  • Introduction to S-Plus and R.
  • Review of basic inference: two-sample confidence intervals and tests, Anova for 3 or more groups.
  • The many roles of statistics: scientific research to business decisions.
  • Sources of data and the purpose of inference:
    • Observation vs Experiment
    • Causal inference vs Prediction
    • Sampling and Randomization
  • Unconditional vs Conditional association: Simpson's Paradox
  • The role of regression analysis. How well can statistical control compensate for the lack of experimental control?
Assignment 1 given out. Due October 1.
  • Part1: Complete  Lesson 1 and Lesson 2 of Getting Started with S-Plus. Hand in the exercise at the end of Lesson 2.
  • Part2: Hand in your answer to the following question (from p. 20 of FPP)

  • In 1964, the Public Health Service of the United States studied the effects of smoking on health in a sample of 42,00 households.  For men and for women in each age group, they found that those who had never smoked were on average somewhat healthier than the current smokers, but the current smokers were on average much healthier than the former smokers.
      • Why did they study men and women and the different age groups separately?
      • The lesson seems to be that you shouldn't start smoking, but once you've started, don't stop. Comment.
References:
  • Your textbook for introductory statistics.
  • ARA: Chapter 1.
  • FPP: Chapters 1 and 2.
Tutorial: A special tutorial will be held from 10 am to 12 noon on Saturday, September 15, in the Gauss Lab (S110 Ross) to help students get started with S-Plus.
September 17 No class
Note: Friday, Sept. 21 is the last day to enroll without permission of the instructor.
Week 2
September 24
Topics:
  • Regression analysis = study of conditional distribution of Y given X
  • Main characteristics of conditional distributions
  • Patterns of dependency on X
  • Basic ideas of non-parametric regression
  • Examining data: histograms, density estimation, quantiles, quantile comparison plots, boxplots.
  • Bivariate data: scatterplots, side-by-side boxplots.
  • Multivariate data: scatterplot matrix, 3-d plots, coplots. [Link with Simpson's Paradox]
References: 
  • ARA, Chap. 2,  Chap. 3.
  • SGS, Chap. 7.11 pp. 158 ff
Links: Exercises: 
Week 3
October 1

Quiz 1
Assignment 1 due

Topics:
  • Transforming distributions: Tukey's ladder of powers and skewness
  • Transforming relationships: Tukey's ladder of powers and non-linearity
  • Transforming heteroskedasticity
  • Putting all the transformations together
  • The logit transformation for proportions (not covered)
  • Simple linear least-squares, correlation, basic formulas. (assigned for review) 
References:ARA, Chap. 4, 59-82. Chap. 5, 85-96.

Links:

Quiz 1
Assignment 1 due.
Assignment 2 given out: due October 22.
October 8  Thanksgiving
Week 4
October 15
Topics:
  • Simple linear least-squares, correlation, basic formulas. 
  • Visualizing the data ellipse, the least-squares line, correlation, R squared.
References: 
  • ARA, Chap. 5, 97-111. Chap. 9, 204-211, Chap. 6, 112-119.
Links:
Week 5
October 22

Quiz 2
Assignment 2 due

Topics:
  • The Regression Paradox: Galton and Pearson: heights of fathers and sons. The concept of correlation. "Regression to Mediocrity." 
  • Multiple regression. Normal equations. Matrix formulation. Multiple correlation. Meaning of coefficients. 
  • Simple regression versus multiple regression. Marginal versus conditional linear association.
Quiz 2:
Assignment 2 due
Assignment 3 given out

Links:

Week 6
October 29

Term Test

Topics:
  • Review
Term Test
Week 7
November 5
Topics:
  • Relationship between simple regression and multiple regression: the chain rule.
  • Statistical inference in simple and multiple regression
Links: 
  • Part 2 of the script begun in Week 5 on the Heart Damage / Coffee / Stressdata is 
  • Week7b_Coffee_Part2.SSC 
  • The data set used in class to illustrate the principle of Extra Sums of Squares and the regression with a dummy varible is available as an Excel file:  blf.xls which you can download (by right-clicking) and import to S-Plus. The script used with this dataset is:  Week7a_BLF.SSC .
  • References:
    Chap. 6, 112-119.
    Quiz 3 moved to November 12
    Due data for assignment 3 moved to November 12.
    Assignment 4 given out.
    Note: Friday, Nov 9 is the last day to drop Fall term courses
    Week 8
    November 12
    Quiz 3
    Assignment 3 due
    Topics:
    • Relationship between unconditional and conditional effects.
    • Regression with a categorical predictor and a continuous predictor: (Using dummy variables)
      • Dichotomous and polytomous with parallel lines
      • Using interactions to model non-parallel lines.
      • Principle of marginality, hypothesis tests for main effects and interactions
      • Polytomous predictors and one-way anova.
    Links: 
  • Script begun in Week 5 on the Heart Damage / Coffee / Stressdata is 
  • Week7b_Coffee_Part2.SSC 

  • The data set used in class to illustrate the principle of Extra Sums of Squares and the regression with a dummy varible is available as an Excel file:  blf.xls which you can download (by right-clicking) and import to S-Plus. The script used with this dataset is:  Week7a_BLF.SSC .
  • An example with a dichotomous variable is in Week7a_BLF.SSC,

  • and, with a polytomous  variable, in  Week8_prestige.SSC
    References:
      ARA, Chap. 7, 135-154 
    Week 9
    November 19

    Quiz 4
    Assignment 4 due

    Topics:
    • Empirical vs structural relations 
    • Effect of Measurement error in predictor variable
    • Regression with a categorical predictors and a continuous predictor: (Using dummy variables)
      • Dichotomous and polytomous with parallel lines
      • Using interactions to model non-parallel lines.
      • Principle of marginality, hypothesis tests for main effects and interactions
      • Polytomous predictors and one-way anova.
    • Extra sums of squares, balanced vs unbalanced designs..
    • Theory of linear models 


    Quiz 4:
    Assignment 4 due.

    Week 10
    November 26
    Topics:
    • Properties of LS estimation: BLUE and UMVUE
    • The general linear hypothesis.
    • Regression diagnostics, outliers and influential observations
    Links:
    Week 11
    December 3 

    Last class
    Quiz 5
    Reports due

    Topics:
  • Regression diagnostics: nonlinearity, heteroschedasticity, non-normality
  • Collinearity
  • Polynomial regression

  •  
    Quiz 5:
    Reports due:
    Final Exam
    December 9
    Sunday, December 9, 7 pm to 10 pm in S137 Ross. 

    Interesting things to look at: