**Statistics Seminar #1**Stefan H. Steiner, University of Waterloo, will give a talk entitled "

**Introduction to Data Mining**" from 10:30a.m. to 11:30a.m. in N638 Ross.ABSTRACT:

*In this talk we shall explore the emerging area of data mining (also called knowledge discovery in databases). Data mining is loosely defined as the (automatic) extraction of novel information from very large databases. Data mining has been applied in a wide variety of application areas including business (marketing, finance, etc.), medicine, science and government. This talk presents a broad overview of the goals of data mining and the typical techniques used to try to accomplish the goals. We also discuss some interesting challenges data mining presents to Statisticians.*Biographical Sketch: Stefan H. Steiner is an assistant professor in the Department of Statistics and Actuarial Sciences at the University of Waterloo. He obtained his Ph.D. in Management Science from McMaster University in 1994, and M.Sc. and BMATH degrees from the University of British Columbia and the University of Waterloo respectively.

Stefan's research interests include industrial statistics and applications of statistics in operations research and management science. Stefan is also an active consultant who has worked with a wide variety of organizations including General Motors Canada, Nortel Networks, Wescast, Petro Canada, Atlantis Aerospace, Woodbridge Foam, Eaton Yale, Plexus, the US Army, the State of Utah and some municipal governments. The consulting projects involved statistical process control, quality improvement, data analysis and process re-engineering.

**Statistics Seminar #2**Duncan Murdoch, University of Western Ontario, will speak on "

**Perfect Sampling: Not Just for Markov Chains?**" from 10:30a.m. to 11:30a.m. in N638 Ross.ABSTRACT:

*Propp and Wilson's (1996) coupling from the past (CFTP) algorithm generates a sample from the limiting distribution of a Markov chain. In this talk I argue that the underlying idea of CFTP is more widely applicable, and will demonstrate attempts to apply it to two problems: approximation of limits and simulation of stochastic differential equations.*Everyone is encouraged to attend. Statistics graduate students are expected to attend.

**Statistics Seminar #3**Professor Nancy Reid, University of Toronto, will speak on "

**Some Aspects of Matching Priors.**" from 10:30a.m. to 11:30a.m. in N638 Ross.ABSTRACT:

*Priors for which Bayesian and frequentist inference agree (at least to some order of approximation) are called M-^Qmatching priors', and have been proposed as candidates for noninformative priors in Bayesian inference. I will present some recent work on various aspects of the matching problem, with applications to $p$-value, confidence limits, and tolerance limits.*Everyone is encouraged to attend. Statistics graduate students are expected to attend.

Professor Nancy Reid, our next speaker, is an internationally recognised statistician. She is amongest the most well known statistians in the world. Nancy has a long list of honours including "The Committee of Presidents of Statistical Societies award" which is the statistical version of the "Fields Medal". She will deliver the "Wald Lecture" this year at IMS meeting which is the most important event of the IMS (Institute of Mathematical Statistics) meeting. To emphasize the importance of the Wald lecture, I just list some of the past Wald lecturers: Samuel Karlin, Bradly Efron, Peter Bickel, Peter Huber, L.D. Brown and Ulf Grenander. This is an impressive list of big guns in statistics. Having Nancy's name on this list is indeed an honour for the statistics community in Canada.

**Statistics Seminar #4**Allan Donner, Department of Epidemiology and Biostatistics, University of Western Ontario, will speak on "

**Issues of Interpretation Arising from Multiple Subgroups in Clinical and Epidemiologic Research**" from 10:30a.m. to 11:30a.m. in N638 Ross.ABSTRACT:

*It is natural in a clinical trial or an epidemiologic study to investigate the effect of intervention or exposure in different subgroups of subjects. However problems of interpretation may result if such analyses are not pre-specified and issues involving multiplicity, estimation bias and type II errors are not taken into account. In this talk, I discuss strategies that may be adopted to address such problems, using recent examples from the clinical and epidemiologic literature to illustrate controversies that may arise.*Everyone is encouraged to attend. Statistics graduate students are expected to attend.

**Statistics Seminar #5**Alison Gibbs, Department of Mathematics and Statistics, York University, will speak on "

**Convergence of Markov Chain Monte Carlo Algorithms with Applications to Image Restoration**" from 11:30a.m. to 12:30a.m. in N638 Ross.ABSTRACT:

*Markov Chain Monte Carlo (MCMC) algorithms, such as the Gibbs sampler and Metropolis-Hastings algorithm, are now widely used for exploring complicated probability distributions. A critical issue for users of these algorithms is the determination of the number of iterations required so that the result will be approximately a sample from the distribution of interest.**In this talk, I will show how the idea of coupled Markov chains can be used to obtain precise bounds on the convergence time of MCMC algorithms. I will consider convergence in both total variation distance, which is the usual choice of probability metric, and in the Wasserstein metric. These methods can also be applied to bounding the running time of Propp and Wilson's (1996) coupling-from-the-past algorithm.**As a particular application, I will describe the use of MCMC in the Bayesian approach to the restoration of a degraded image.*Everyone is encouraged to attend. Statistics graduate students are expected to attend.

**Statistics Seminar #6**David B. Wolfson, Department of Mathematics and Statistics, McGill University, will speak on "

**Length-biased Sampling With An Application To Assessing Survival With Dementia**" from 10:30a.m. to 11:30a.m. in N638 Ross.ABSTRACT:

*Left truncation of survival data occurs when subjects whose survival times are shorter than either fixed or random quantities, are simply not observed. Often in epidemiologic studies, prevalent cases with a disease are identified through a cross-sectional study carried out over a short time period. These cases are then followed for a fixed time period at the end of which the subjects will either have failed or have been censored. When interest lies in estimating the survival distribution, from onset, of subjects with the disease, one must take into account that the survival times of the cases identified in such a study are left truncated or length-biased; the long survivors tend to be those cases identified at the start of the study. I shall give a brief overview of length-bias and discuss how one may estimate the "true" unbiased distribution from length-biased data. In particular, I shall propose an unconditional approach that, while requiring stronger assumptions than the traditional conditional method of the literature, produces narower confidence intervals and hence more precise inference. The problem of estimating the survival distribution of subjects identified with dementia, including Alzheimer's disease, threads its way through this talk that has a surprising conclustion.*Everyone is encouraged to attend. Statistics graduate students are expected to attend.

**Statistics Seminar #7**Luc Devroye, McGill University, will speak on "

**Model Selection in Density Estimation**" from 10:30a.m. to 11:30a.m. in N638 Ross.ABSTRACT:

*Density estimates are just smoothed versions of the empirical measure. The main problem in density estimation thus is the data-based choice of an estimate from a class of estimates. Particular instances include the choice of a bandwidth in kernel estimation, the choice of a threshold level for wavelet estimates, the choice of a parameter in the Box-Cox transformation (when used before a kernel or histogram estimate), and the choice of the number of basis functions in a series estimate. We propose a general method that has the property that for all densities, under conditions limiting the size of the complexity of the classes of estimates, the expected L1 error is not more than about 3 times the optimal L1 error (the error corresponding to the estimate from the class if the density were revealed to us beforehand).*

This is joint work with Gabor Lugosi.Everyone is encouraged to attend. Statistics graduated students are expected to attend.

**Statistics Seminar #8**Keith Worsley, McGill University, will speak on "

**A Test For A Conjunction**" from 10:30a.m. to 11:30a.m. in N638 Ross.ABSTRACT:

*A conjunction is defined in the brain mapping literature as the occurrence of the same event at the same location in two or more independent 3D brain images. The images are smooth isotropic 3D random fields of test statistics, and the event occurs when the image exceeds a fixed high threshold. We give a simple approximation to the probability of a conjunction occurring anywhere in a fixed region, so that we can test for a local increase in mean of the images at the same unknown location in all images, a generalization of the split-t test. This is the corollary to a more general result on the expected Minkowski functionals of the set of points where a conjunction occurs.*Everyone is encouraged to attend. Statistics graduated students are expected to attend.

**Statistics Seminar #9**J.N.K. Rao, Carleton University, will speak on "

**Small Area Estimation: Methods and Applications**" from 10:30a.m. to 11:30a.m. in N638 Ross.ABSTRACT:

*Reliable small area statistics are needed in formulating policies and programs, allocation of government funds, market research and so forth. Traditional area-specific direct survey estimators are not suitable for this purpose because the sample size in a small area is typically too small to provide estimators with acceptable precision. It is necessary therefore to use indirect estimators by borrowing data from related small areas to increase the effective sample size and thus the precision. In this talk I will give an overview of recent model-based methods, in particular, empirical Bayes and hierarchical Bayes methods.Techniques for measuring the variability of the estimators will also be discussed. Recent applications of model-based methods will be presented.*Everyone is encouraged to attend. Statistics graduated students are expected to attend.