Michael Friendly

Psychology Department

York University

There is also:

- An online, web application, with several sets of sample data. You can submit your own data through a form or an uploaded data file.
- A brief tutorial introduction to mosaic displays.
- A compressed Postscript version of this document (mosaics.ps.gz, 107K)
and a PDF version
(mosaics.pdf).
**Note: This HTML version is no longer maintained. The Postscript and PDF versions are more up to date.**

This report describes MOSAICS, a collection of SAS/IML programs and macros for producing mosaic displays. The programs has the following features:

- It produces graphical displays of an n-way contingency table of any size. Experience shows that tables of up to 5 or 6 dimensions can be usefully explored. The main limitation is in the resolution of the display with large, complex tables.
- The order of variables in the mosaic is specified by the user. Different orderings of the variables can show different aspects of the data.
- For an unordered factor, the order of its levels can be determined to enhance understanding of the pattern of association. This ordering can be found from a correspondence analysis of the residuals from a model of independence.
- The program can produce sequential displays of the marginal
subtables,
*[A], [AB], [ABC]*, and so forth, up to the full n-way table, where*A, B, C, ...*refer to the table variables in the order entered. - For each display, the program fits a log-linear model and depicts the residuals from the model by the color and shading of tiles in the mosaic.
- The program can automatically construct and fit a wide set of baseline models of independence or partial independence among the table variables. A shorthand keyword is used to specify many models of interest. Alternatively, the user can specify and fit any log-linear model which can be estimated by iterative proportional fitting.
- The program can perform a correspondence analysis on marginal subtables to suggest a reordering of the levels of each variable.
- Models and tables with structural zeros area accommodated naturally.
- A contingency table can be read from a SAS data set or entered in SAS/IML as a table of frequencies together with variable name and factor level values. A collection of sample contingency tables in this format is suppplied.
- A SAS macro,
`mac/mosaic.sas`provides a more easily-used interface to the SAS/IML modules. - Other SAS/IML modules extend the idea of mosaic displays to mosaic matrices (
`mosmat.sas`), both marginal and conditional, and partial mosaic plots (`mospart.sas`). Partial mosaics are included in the`mac/mosaic.sas`macro; mosaic matrices have their own macro (`mac/mosmat.sas`).

- copy the
files MOSAICS.SAS and MOSAICM.SAS to a directory, (
`'~/sasuser/mosaics/'`, or`'c:\sasuser\mosaics\'`, say), - Edit the
`libname`and`filename`statements to correspond to this directory. On a Unix system, these might be,*-- Change the path in the following filename statement to point to the installed location of mosaics.sas; filename mosaics '~/sasuser/mosaics/'; *--- Change the path in the libname to point to where the compiled modules will be stored, ordinarily the same directory; libname mosaic '~/sasuser/mosaics/';

On Windows,filename mosaics 'c:\sasuser\mosaics\'; libname mosaic 'c:\sasuser\mosaics\';

- You may wish to change some of the program default values,
(in the module
`globals`in MOSAICS.SAS) particularly the`font=`value which is set to`font='hwpsl009'`(Helvetica for the PS driver) in the distribution copy. - Run the MOSAICM.SAS program, with the
command,
sas mosaicm

- Optionally, install the sample data sets (see ``Sample data sets'')
by running
`sas mosdata`.These steps need only be done once.

In applications, the modules are loaded into the SAS/IML
workspace with the `load` or `%include`
statement, as follows,

libname mosaic '~/sasuser/mosaics'; proc iml; reset storage=mosaic.mosaic; load module=_all_;On most platforms, a

Alternatively, it is possible to store and use the program in
source form. This avoids the need to maintain and access the SAS/IML
catalog, but means that the program is compiled each time it is run.
To use the program in this way, simply access the program with a `
%include` statement:

filename mosaicsOn some platforms you may need to add a path specification to the'path/to/mosaics.sas'; proc iml; %include mosaics;

libname mosaic '~/sasuser/mosaics' access=readonly;You can place this statement in the system-wide

Alternatively, copy the MOSAICS.SAS file to any public (readable)
directory, and instruct users to load them using the
`%include` statement, as described above.

If you are using IML, the contingency table can either be defined directly with IML statements, or input from a SAS data set. The macro reads data from a SAS data set.

proc iml symsize=256; reset storage=mosaic.mosaic; load module=_all_; *-- specify data parameters; levels = { ... }; *-- variable levels; table = { ... }; *-- contingency table; vnames = { ... }; *-- variable names; ... *-- specify non-default global inputs; fittype='USER'; config = { 1 1, 2 3 }; run mosaic(levels, table, vnames, lnames, plots, title);

The n-way contingency table to be analyzed is specified by the
`table` parameter; the names of the dimension (factor) variables
and the names of the values that the dimension variables take on are
specified in the `vnames` and `lnames` parameters,
respectively, as described below.

In situations where the contingency table and factor variables are
available in a SAS dataset,
the `table`, `levels`, and `lnames` matrices may be constructed with the
`readtab` module, described in
Dataset Input.
The parameters for the `run mosaic` statement are:

- Parameter
- Description
`levels`- is a vector which specifies the number of variables
and the dimensions of the contingency table. If
`levels`is*n x 1*, then the table has*n*dimensions, and the number of levels of variable*i*is`levels[i]`. The order of the variables in`levels`is the order they are entered into the mosaic display. `table`- is a matrix or vector giving the frequency,
*f*, of observations in each cell of the table. The table variables are arranged in accordance with the conventions of the SAS/IML_{ij...}`IPF`and`MARG`functions, so the**first**variable varies most rapidly across the columns of`table`and the last variable varies most slowly down the rows.In addition

`table`must conform to`levels`as follows. If`table`is*I*rows by*J*columns, the product of all entries in`levels`must be*IJ*. Moreover,*J*must equal the product of the first*k*entries of`levels`, for some*k*. That is, the columns must correspond to the combinations of one or more of the first*k*factors. `vnames`- is a
*1 x n*character vector of variable (factor) names, in an order corresponding to`levels`. `lnames`- is a character matrix of labels for the variable
levels, one row for each variable. The number of columns
is the maximum value in
`levels`. When the number of levels are unequal, the rows for smaller factors must be padded with blank entries. `plots`- is a vector containing any of the integers 1 to
*n*which specifies the list of marginal tables to be plotted. If`plots`contains the value*i*the marginal subtable for variables 1 to*i*will be displayed. For a 3-way table,`plots={1 2 3}`displays each sequential plot, showing the [A], [AB] and [ABC] marginal tables; while`plots=3`displays only the final 3-way [ABC] mosaic. `title`- is a character string or vector of strings containing
title(s) for the plots. If
`title`is a single character string, it is used as the title for all plots. Otherwise,`title`may be a vector of up to`max(plots)`strings, and`title[i]`is used as the tile for the plot produced by`plots[ ] = i`. If the number of strings is less than`max(plots)`the last string is used for all remaining plots.Moreover, if the title for a given plot contains the string

`&MODEL`(upper case), that string is replaced by the symbolic model description. Similarly, the string`&G2`(or`&X2`) is replaced by the LR (Pearson) chisquare value and df for the current model, in the form 'G2 (df) = value'. Enclose such titles in**single quotes**, otherwise the SAS macro processor will complain about an 'Apparent symbolic reference'. For example, the specifications,plots = 2:3; fittype='JOINT'; title = { '', 'Hair-color Eye-color Data Model (H)(E)', 'Hair-color Eye-color Data Model (HE)(S)'};

produces two plots with titles from`title[2]`and`title[3]`.(1). Equivalent results (using substitution) are produced with the single title,title = 'Hair-color Eye-color Data Model &MODEL';

------------------------

(1) SAS/GRAPH fonts do not produce brackets,`[ ]`and braces,`{ }`. Use parentheses instead in model symbolic formulae.

------------------------

`colors`- is a character vector of one or two elements
specifying the colors used for positive and negative
residuals. The default is
`{BLUE RED}`. For a monochrome display, specify`colors='BLACK'`and use two distinct fill patterns for the fill type, such as`filltype={M0 M45}`or`filltype={GRAY M45}`. `config`- is a numeric or character matrix specifying which marginal totals
to fit when
`fittype='USER'`is also specified.`config`is ignored for all other fit types. Each column specifies a high-order marginal in the model, either by the names of the variables, or by their indices, according to their order in`vnames`. For example, the log-linear model*[AB][AC] [BC]*for a three-way table is specified by the 2 by 3 matrix,config = { 1 1 2, 2 3 3};

orconfig = { A A B, B C C};

The same model can be specified more easily row-wise, and then transposed:config = t( {1 2, 1 3, 2 3} );

`devtype`{*GF*| LR | FT}- is a character string which specifies the type of
deviations (residuals) to be represented by shading.
`devtype='GF'`is the default.`GF`- calculates components of Pearson goodness
of fit chisquare,
where
*m hat*is the estimated expected frequency under the model._{ij} `LR`- calculates components of the likelihood ratio (deviance) chisquare,
`FT`- calculates Freeman-Tukey residuals,

`fittype`{*JOINT*| MUTUAL | CONDIT | PARTIAL | MARKOV | USER}- is a character string which specifies the type of
sequential log-linear models to fit.
`fittype='JOINT'`is the default. For two-way tables, (or two-way margins of larger tables) all fittypes fit the independence model.`JOINT`*k*- specifies sequential models of joint
independence,
*[A][B], [AB][C] , [ABC][D], ...*These models specify that the last variable in a given plot is independent of all previous variables jointly.Optionally, the keyword

`JOINT`may be followed by a digit,*k*, to specify which of the*n*ordered variables is independent of the rest jointly. `MUTUAL`- specifies sequential models of mutual
independence,
*[A][B], [A][B][C] , [A][B][C][D], ...* `CONDIT`*k*- specifies sequential models of
conditional independence which hypothesize
that all previous variables are independent,
given the last, i.e.,
*[A][B], [AC][BC], [ A D ] [ B D ] [ C D] , ...*For the 3-way model, A and B are hypothesized to be conditionally independent, given C; for the 4-way model, A, B, and C are conditionally independent, given D.Optionally, the keyword

`CONDIT`may be followed by a digit,*k*, to specify which of the*n*ordered variables is conditioned upon. `PARTIAL`- specifies sequential models of partial
independence of the first pair of variables,
conditioning on all remaining variables one
at a time:
*[A][B], [AC][BC] , [ A C D ] [ B C D ], ...*For the 3-way model, A and B are hypothesized to be conditionally independent, given C; for the 4-way model, A and B are conditionally independent, given C and D. `MARKOV`*k*- specifies a sequential series of Markov chain
models fit to the table, whose dimensions are assumed to represent
discrete ordered time points, such as lags in a sequential analysis.
The keyword
`MARKOV`can be optionally followed by a digit to specify the order of the Markov chains, e.g.,`fittype='MARKOV2';`specifies a second-order Markov chain. First-order is assumed if not specified. Such models assume that the table dimensions are ordered in time, e.g., Lag0, Lag1, Lag2, ...`MARKOV`(or`MARKOV1`) fits the models*[A][B], [AB] [BC], [AB] [BC] [CD], ...*, where the categories at each lag are associated only with those at the previous lag.`MARKOV2`fits the models*[A][B], [A] [B] [C], [ABC] [BCD], [ABC] [BCD] [CDE], ...*. `USER`- If
`fittype='USER'`, specify the hypothesized model in the global matrix`config`. The models for plots of marginal tables are based on reducing the hypothesized configuration, eliminating all variables not participating in the current plot.

`filltype`{M45 | LR | M0 | GRAY |*HLS*}- is a character vector of one or two elements which
specifies the type of fill pattern to use for
shading.
`filltype[1]`is used for positive residuals;`filltype[2]`, if present, is used for negative residuals. If only one value is specified, a complementary value for negative residuals is generated internally.`filltype={HLS HLS}`is the default.`M45`- uses SAS/GRAPH patterns
`MdN135`and`Md45`with hatching at 45 and 135°.`d`is the density value determined from the residual and the`shade`parameter. `LR`- uses SAS/GRAPH patterns
`Ld`and`Rd`. `M0`- uses SAS/GRAPH patterns
`MdN0`and`MdN90`with hatching at 0 and 90°.*step* `GRAY`*step*- uses solid, greyscale fill using the
patterns
`GRAY`starting from*nn*`GRAYF0`for density=1 and increasing darkness by*step*for each successive density level. The default for*step*is 16, so`'GRAY'`gives`GRAYF0`,`GRAYE0`,`GRAYD0`, and so forth. `HLS`- uses solid, color-varying fill based on the HLS color
scheme. The colors are selected attempting to vary the lightness
in approximately equal steps. For this option, the
`colors`values must be selected from the following hue names: RED GREEN BLUE MAGENTA CYAN YELLOW.

`cellfill`{*NONE*| SIGN | SIZE | DEV}- Provides the ability to display a symbol in the cell
representing the coded value of large residuals. This is
particularly useful for black and white output, where it
is difficult to portray both sign and magnitude
distinctly.
`NONE`- Nothing (default)
`SIGN`- Draws + or - symbols in the cell, whose number corresponds to the shading density.
`SIZE`- Draws + or - symbols in the cell, whose size corresponds to the shading density.
`DEV`- Writes the value of the standardized residual in the cell.

`htext`- is a numeric value which specifies the height of text
labels, in character cells. The default is
`htext=1.3`. The program attempts to avoid overlap of category labels, but this cannot always be achieved. Adjust`htext`(or make the labels shorter) if they collide. `legend`{H | V |*NONE*}- Orientation of legend for shading of residual values in mosaic tiles. 'V' specifies a vertical legend at the right of the display; 'H' specifies a horizontal legend beneath the display. Default: 'NONE'.
`order`{*NONE*| [ DEV | JOINT ] | [ ROW | COL ] }- Specifies whether and how to perform a correspondence
analysis to assist in reordering the levels of each
factor variable as it is entered into the mosaic display.
Not performed if
`order='NONE'`. Otherwise, order may be a character vector containing either 'DEV' or 'JOINT' to specify that the CA is performed on residuals from the model for the current subtable (DEV) or on residuals from the model of joint independence for this subtable (JOINT). In addition,`order`may contain either 'ROW' or 'COL' or both to specify which dimensions of the current subtable are considered for reordering. The ususal options for this reordering areorder = {JOINT COL};

At present this analysis merely produces printed output which suggests an ordering, but does not actually reorder the table or the mosaic display.

`shade`- is a vector of up to 5 values of
*| d*, which specify the boundaries between shading levels. If_{ij}|`shade={2 4}`(the default), then the shading density number`d`is:0

Standardized deviations are often referred to a standard Gaussian distribution; under the assumption that the model fits, these values roughly correspond to two-tailed probabilities*0 <= | d*1_{ij}| < 2*2 <= | d*2_{ij}| < 4*4 <= | d*_{ij}|*p < .05*and*p < .0001*that a given value of*| d*exceeds 2 or 4, respectively. Use_{ij}|`shade=`a big number to suppress all shading. `space`- is a vector of two values which specify the
*x, y*percent of the plotting area reserved for spacing between the tiles of the mosaic. The default value is 10 times the number of variables allocated to each of the vertical and horizontal directions in the plot. `split`- is a character vector consisting of the letters
`V`and`H`which specifies the directions in which the variables divide the unit square of the mosaic display. If`split={H V}`(the default), the mosaic alternates between horizontal and vertical splitting. If the number of elements in`split`is less than the maximum number in`plots`, the elements in`split`are reused cyclically. `verbose`{*NONE*| FIT | BOX}- is a character vector of one or more words which
controls verbose or detailed output. If
`verbose`contains`'FIT'`, additional details of the fitting process (fitted frequencies, marginal proportions) are printed. If`verbose`contains`'BOX'`, additional details of the drawing process (tile dimensions, label placement) are printed. `vlabels`- is an integer from 0 to the number of
variables in the table. It specifies that variable names (in addition
to level names) are to be used to label the first
`vlabels`variables. The default is`vlabels=2`, meaning variable names are used in plots of the first two variables only. `zeros`- is a matrix of the same size and shape as the input
`table`containing entries of 0 or 1, where 0 indicates that the corresponding value in table is to be ignored or treated as missing or a structural zero.Zero entries cause the corresponding cell frequency to be fitted exactly; one degree of freedom is subtracted for each such zero. The corresponding tile in the mosaic display is outlined in black.

If an entry in any marginal subtable in the order [A], [AB], [ABC] ... corresponds to an all-zero margin, that cell is treated similarly as a structural zero in the model for the corresponding subtable. Note, however, that tables with zero margins may not always have estimable models.

If the

`table`contains zero frequencies which should be treated as structural zeros, assign the`zeros`matrix like this:zeros = table > 0;

For a square table, to fit a model of quasi-independence ignoring the diagonal entries, assign the

`zeros`matrix like this (assuming a 4 x 4 table):zeros = J(4,4) - I(4);

There is one caveat imposed by this use of global variables: The
`mosaic` module should not be called from an IML module with
its own arguments, since this would cause all variables defined
within that module to inaccessible as global variables. The `
mosaic` module may be called either in immediate mode, as in the
examples in the next section, or from an IML module defined without
arguments.

goptions hsize=7 in vsize=7 in;

The program uses the colors blue and red to draw the tiles
corresponding to positive and negative residuals. You can specify
the IML global `colors` variable to
change these assignments if you wish. (Or, change the default
values in the `globals` module.)

The program cannot access global fonts assigned with the `GOPTION FTEXT=`
and `HTEXT=` options. Instead, you may specify a desired font with the
IML global `font` and `htext` variables.
For some output devices (e.g., PostScript), specifying a hardware font
(e.g., `font = 'hwpsl009';` for Helvetica) can yield an enormous
reduction in the size of the generated graphic output files.

It uses three global SAS macro variables:

- DEVTYPE
- Device type: Use
`%let devtype=eps;`for EPS output. - DISPLAY
- Display option: Use
`%let display=ON;`for ordinary use. Setting DISPLAY=OFF suppresses graphic output (for all devices). - FIG
- Figure number: Initialize to 1
`%let fig=1;`

%global fig gsasfile devtype; %macro eps; %let devtype = EPS; %let fig=1; %let gsasfile=grfout.eps; %put gsasfile is: "&gsasfile"; filename gsasfile "&gsasfile"; goptions horigin=.5in vorigin=.5in; *-- override, for BBfix; goptions device=PSLEPSFC gaccess=gsasfile gend='0A'x gepilog='showpage' '0A'x /* only for 6.07 */ gsflen=80 gsfmode=replace; %mend;

free fittype;before the next

* Sex, Occupation and heart disease [Karger, 1980]; data heart; input gender $ occup $ @; heart='Disease'; input freq @; output; heart='No Dis'; input freq @; output; cards; Male Unempl 254 759 Female Unempl 431 10283 Male WhiteCol 158 3155 Female WhiteCol 52 3082 Male BlueCol 87 2829 Female BlueCol 16 416 ; proc sort data=heart; by heart occup gender; proc iml; title = 'Sex, Occupation, and Heart Disease'; reset storage=mosaic.mosaic; load module=_all_;Thevnames = {'Gender' 'Occup' 'Heart' }; run readtab('heart', 'freq', vnames, table, levels, lnames);plots = 2:ncol(levels); run mosaic(levels, table, vnames, lnames, plots, title);

Note that if you sort the dataset as in the example above, character-valued
index variables are arranged in **alphabetical order**.
For example, the levels of `occup` are arranged in the order
`BlueCol, Unempl, WhiteCol`, which may or may not be what you
want. The `PROC SORT` step can be omitted, in which case the
levels are ordered according to their order in the input dataset.

You can also use the `DESCENDING` option in the `PROC SORT`
step to reverse the order of the levels of a given factor.
For example, to reverse the levels of the `gender` variable, use

proc sort data=heart; by heart occup[add more description]descendinggender;

fit = (f + f`)/2; dev = (f - fit)/sqrt(fit);where

proc iml; dim = { 4 4 }; /* Unaided distant vision data Bishop etal p. 284*/ /* Left eye grade */ f = {1520 266 124 66, 234 1512 432 78, 117 362 1772 205, 36 82 179 492 }; title = {'Unaided distant vision: Symmetry'}; vnames = {'Right Eye','Left Eye'}; lnames = { 'High' '2' '3' 'Low', 'High' '2' '3' 'Low'}; reset storage=mosaic.mosaic; load module=_all_; %include '~/sasuser/mosaics/mosaicd.sas'; fit = (f + f`)/2; dev = (f - fit)/sqrt(fit); run mosaicd(dim, f, vnames, lnames, dev, title);The sample program,

The module `haireye` creates the variables `
table`, `levels`, `vnames`, `lnames`,
and `title`. Since the variables are to be entered into the
mosaic in the order hair color, eye color, and sex, the `
table` variable is created as a *2 x 16* matrix with
hair color varying most rapidly across the columns and sex varying
down the two rows. Note that the `lnames` variable is a
*3 x 4* matrix, and the last row contains two blank values. The
statement `run haireye;` creates these variables in the
SAS/IML workspace.

The first `run mosaics` statement produces two plots,
whose tiles show the [Hair][Eye] marginal table and the full
three-way table. Since `fittype` is not specified, the model
[HairEye] [Sex], in which Sex is independent of hair color and eye
color jointly, is fit to the three-way table. `split={V H}`
specifies that the first division of the mosaic is in the vertical
direction. The printed output produced from this run is shown in Figure 1.

The second `run mosaics` statement fits the same models,
but reorders the eye colors in the table to better display the
pattern of association between hair color and eye color in the
two-way table. It is also necessary to rearrange the eye color
labels in row 2 of `lnames`. (This reordering is based on a
correspondence analysis of residuals in the two-way table described
by Friendly (1994) carried out separately.) Note that the global
variables `split` and `htext` specified in the first
mosaic continue to be used here. The plots produced from this call
are shown in Figure 2 and Figure 3.

The third `run mosaics` statement plots only the
three-way display, showing residuals from the model in which hair
color, eye color and sex are mutually independent. This plot is
shown in Figure 4.

goptions vsize=7in hsize=7in ; *-- square plot environment; proc iml; start haireye; *-- Hair color, eye color data; table = { /* ----brown--- -----blue----- ----hazel--- ---green--- */ 32 53 10 3 11 50 10 30 10 25 7 5 3 15 7 8, /* M */ 36 66 16 4 9 34 7 64 5 29 7 5 2 14 7 8 }; /* F */ levels= { 4 4 2 }; vnames = {'Hair' 'Eye' 'Sex' }; /* Variable names */ lnames = { /* Category names */ 'Black' 'Brown' 'Red' 'Blond', /* hair color */ 'Brown' 'Blue' 'Hazel' 'Green', /* eye color */ 'Male' 'Female' ' ' ' ' }; /* sex */ title = 'Hair color - Eye color data'; finish; run haireye; reset storage=mosaic.mosaic; load module=_all_; *-- Fit models of joint independence (fittype='JOINT'); plots = 2:3; split={V H}; htext=1.6; run mosaic(levels, table, vnames, lnames, plots, title); *-- reorder eye colors (brown, hazel, green, blue); table = table[,((1:4) || (9:16) || (5:8))]; lnames[2,] = lnames[2,{1 3 4 2}]; plots=2:3; run mosaic(levels, table, vnames, lnames, plots, title); plots=3; fittype='MUTUAL'; run mosaic(levels, table, vnames, lnames, plots, title); quit;

+-------------------------------------------------------------------+ | | | +-------------------------------------------+ | | | Generalized Mosaic Display, Version 2.9 | | | +-------------------------------------------+ | | | | TITLE | | Hair color - Eye color data | | | | VNAMES LEVELS LNAMES | | Hair 4 Black Brown Red Blond | | Eye 4 Brown Hazel Green Blue | | Sex 2 Male Female | | | | Global options | | | | FITTYPE DEVTYPE FILLTYPE SPLIT SHADE | | JOINT GF M45 V H 2 4 | | | | Factor: 1 Hair | | | | Marginal totals | | | | MARGIN Black Brown Red Blond | | | | 108 286 71 127 | | | | Factor: 2 Eye | | | | Marginal totals | | | | MARGIN Brown Hazel Green Blue | | | | Black 68 15 5 20 | | Brown 119 54 29 84 | | Red 26 14 14 17 | | Blond 7 10 16 94 | | | | | | MODEL DF CHISQ PROB | | {Hair}{Eye} 9 G.F. 138.290 0.0000 | | L.R. 146.444 0.0000 | | | | Standardized Pearson deviations | | | | Brown Hazel Green Blue | | | | Black 4.40 -0.48 -1.95 -3.07 | | Brown 1.23 1.35 -0.35 -1.95 | | Red -0.07 0.85 2.28 -1.73 | | Blond -5.85 -2.23 0.61 7.05 | | | | Factor: 3 Sex | | | | Marginal totals | | | | MARGIN Male Female | | | | Black Brown 32 36 | | Black Hazel 10 5 | | Black Green 3 2 | | Black Blue 11 9 | | Brown Brown 38 81 | | Brown Hazel 25 29 | | Brown Green 15 14 | | Brown Blue 50 34 | | Red Brown 10 16 | | Red Hazel 7 7 | | Red Green 7 7 | | Red Blue 10 7 | | Blond Brown 3 4 | | Blond Hazel 5 5 | | Blond Green 8 8 | | Blond Blue 30 64 | | | | | | MODEL DF CHISQ PROB | | [Hair,Eye][Sex] 15 G.F. 28.993 0.0161 | | L.R. 29.350 0.0145 | | | | Standardized Pearson deviations | | | | Male Female | | | | Black Brown 0.30 -0.27 | | Black Hazel 1.28 -1.15 | | Black Green 0.52 -0.46 | | Black Blue 0.70 -0.63 | | Brown Brown -2.07 1.86 | | Brown Hazel 0.19 -0.17 | | Brown Green 0.57 -0.52 | | Brown Blue 2.05 -1.84 | | Red Brown -0.47 0.42 | | Red Hazel 0.30 -0.27 | | Red Green 0.30 -0.27 | | Red Blue 0.88 -0.79 | | Blond Brown -0.07 0.06 | | Blond Hazel 0.26 -0.23 | | Blond Green 0.32 -0.29 | | Blond Blue -1.84 1.65 | | | +-------------------------------------------------------------------+Figure 1: Printed output for hair color, eye color data, run 1

Figure 2: Two-way mosaic for hair color and eye color. Positive deviations from independence have solid outlines and are shaded blue. Negative deviations have dashed outlines and are shaded red. The two levels of shading density correspond to standardized deviations greater than 2 and 4 in absolute value.

Figure 3: Mosaic display for hair color,
eye color, and sex. The categories of sex are crossed with those of
hair color, but only the first occurrence is labeled. Residuals from
the model *[HE] [S]* are shown by shading.

Figure 4: Mosaic display for hair color,
eye color, and sex, showing residuals from the model of
complete independence, *[H] [E] [S]* (This figure was created
in a separate run, using the LEGEND option.)

The data is a *2 ^{ 4}* table classified by Gender,
reported Pre-marital sex, Extra-marital sex and Marital Status, read
in by the DATA step

data marital; input gender $ pre $ extra $ @; marital='Divorced'; input freq @; output; marital='Married'; input freq @; output; cards; Women Yes Yes 17 4 Women Yes No 54 25 Women No Yes 36 4 Women No No 214 322 Men Yes Yes 28 11 Men Yes No 60 42 Men No Yes 17 4 Men No No 68 130 ; proc sort data=marital; by marital extra pre gender;

In the `proc iml` step, the statement `use
marital;` accesses the data set. The variable `freq`
from the data set is read into the IML `table` variable, a
*16 x 1* matrix. Note that the levels of the character
variables `gender`, `pre`, and `extra` are
sorted alphabetically, so the category labels in `lnames`
must appear in this order.

proc iml; use marital; read all var{freq} into table; levels = { 2 2 2 2 }; vnames = {'Gender' 'Pre' 'Extra' 'Marital'}; lnames = {'Men ' 'Women ', 'Pre Sex: No' 'Yes', 'Extra Sex: No' 'Yes', 'Divorced' 'Married' }; title = 'Pre/Extramarital Sex and Marital Status'; reset storage=mosaic.mosaic; load module=_all_; split = {V H}; htext=1.6; plots = 2:4; run mosaic(levels, table, vnames, lnames, plots, title); plots = 4; fittype='USER'; title ='Model (GPE, PM, EM)'; config = { 1 2 3, 2 4 4, 3 0 0}; run mosaic(levels, table, vnames, lnames, plots, title);

The first `run mosaic` statement produces plots of the
2-way to 4-way tables, fitting models of joint independence. The
second `run mosaic` statement produces a plot of the 4-way
table, fitting the model [GPE] [PM] [EM] specified by the
`config` variable and `fittype='USER';`.
This model treats G, P, and E as explanatory, and M as a response.
This is equivalent to the logit model with main effects of
premarital sex and extramarital sex on marital status.

Using the `readtab` routine, this example can be simplified
as follows. The routine constructs the `table, levels`,
and `lnames` variables. (But note that the values of
the Pre and Extra variables are both simply 'Yes' or 'No'.)

proc iml; vnames = {'Gender' 'Pre' 'Extra' 'Marital'}; run readtab('marital', 'freq', vnames, table, levels, lnames); title = 'Pre/Extramarital Sex and Marital Status'; reset storage=mosaic.mosaic; load module=_all_; split = {V H}; htext=1.6; plots = 2:4; run mosaic(levels, table, vnames, lnames, plots, title); ...

The variables in a contingency table are reordered by the MARG
function (which calculates marginal totals) when the model specified
by the `config` parameter is the saturated model, with the
variables listed in the desired order. For example, for the four-way
table of the previous example, the configuration `{4,3,2,1}`
gives the same order of the variables created by the `proc
sort` step.

MOSAICS.SAS includes an IML module `reorder` (shown partly
below) which will reorder the variables in any table. It also
rearranges the values in the `levels`, `vnames`,
and `lnames` variables in the same order.

start reorder(dim, table, vnames, lnames, order); *-- reorder the dimensions of an n-way table; if nrow(order) =1 then order=order`; run marg(loc,newtab,dim,table,order); table = newtab; dim = dim[order,]; vnames = vnames[order,]; lnames = lnames[order,]; finish;

The data `table` is defined, listing the observations in
the same order as in the DATA step `marital` shown in Example
2. Note that `vnames` and `lnames` conform to this
order. After the call to `reorder` the variables `
table`, `levels`, `vnames`, and `lnames`
have been rearranged so that Gender is the first variable in the
mosaic, and Marital status is last.

proc iml; *-- define the data variables; table={ 17 4 , /* Women Yes Yes */ 54 25 , /* Women Yes No */ 36 4 , /* Women No Yes */ 214 322 , /* Women No No */ 28 11 , /* Men Yes Yes */ 60 42 , /* Men Yes No */ 17 4 , /* Men No Yes */ 68 130 }; /* Men No No */ levels = { 2 2 2 2 }; vnames = {'Marital' 'Extra' 'Pre' 'Gender'}; lnames = {'Divorced' 'Married', 'Extra Sex: Yes' 'No', 'Pre Sex: Yes' 'No', 'Women ' 'Men' }; title = 'Pre/Extramarital Sex and Marital Status'; reset storage=mosaic.mosaic; load module=_all_; order = { 4,3,2,1}; run reorder(levels, table, vnames, lnames, order); split = {V H}; plots = 2:4; run mosaic(levels, table, vnames, lnames, plots, title); quit;

Module name | Ways | Title Variable names(dimensions) |
---|---|---|

abortion | 3 | Abortion opinion dataSex (2) x Status (2) x Support Abortion (2) |

bartlett | 3 | Bartlett dataAlive? (2) x Time (2) x Length (2) |

berkeley | 3 | Berkeley Admissions DataAdmit (2) x Gender (2) x Dept (6) |

cancer | 3 | Breast Cancer PatientsSurvival (2) x Grade (2) x Center (2) |

cesarean | 4 | Risk factors for infection in cesarean birthsInfection (3) x Risk? (2) x Antibiotics (2) x Planned (2) |

detergen | 4 | Detergent preference dataTemperature (2) x M-User? (2) x Preference (2) x Water softness (3) |

dyke | 5 | Sources of knowledge of cancerKnowledge (2) x Reading (2) x Radio (2) x Lectures (2) x Newspaper (2) |

employ | 3 | Employment Status DataEmployStatus (2) x Layoff (2) x LengthEmploy (6) |

gilby | 2 | Clothing and intelligence rating of childrenDullness (6) x Clothing (4) |

haireye | 3 | Hair color - Eye color dataHair (4) x Eye (4) x Sex (2) |

heckman | 5 | Labour force participation of married women 1967-19711971 (2) x 1970 (2) x 1969 (2) x 1968 (2) x 1967 (2) |

hoyt | 4 | Minnesota High School GraduatesStatus (4) x Rank (3) x Occupation (7) x Sex (2) |

marital | 4 | Pre/Extramarital Sex and Marital StatusMarital (2) x Extra (2) x Pre (2) x Gender (2) |

mobility | 2 | Social Mobility dataSon's Occupation (5) x Father's Occupation (5) |

suicide | 3 | Suicide dataSex (2) x Age (5) x Method (6) |

titanic | 4 | Survival on the TitanicClass (4) x Sex (2) x Age (2) x Survived (2) |

victims | 2 | Repeat Victimization DataFirst Victimization (8) x Second Victimization (8) |

The program `mosdata.sas` is set up so that running it will
create a SAS/IML storage catalog `MOSDATA` in the
`MOSAIC` library.
Once this has been done, any data set may be obtained by loading the
module
from `MOSAIC.MOSDATA` and running it.
For example, the previos example
could be done using the module `marital`,
as shown below.

proc iml; reset storage=mosaic.mosdata; load module=marital; run marital; reset storage=mosaic.mosaic; load module=_all_; ord = { 4,3,2,1}; run reorder(dim, table, vnames, lnames, ord); split = {V H}; plots = 2:4; run mosaic(dim, table, vnames, lnames, plots, title); quit;

- Denote the number of levels of the
*n*variables by*l*, and let_{1}, ... , l_{n}*L*. At step_{ s}= Product from i=1 to s l_{ i}*s = 0*, start with one tile, a square of size*100 x 100*, and let*L*._{ 0}= 1 - The tiles in the mosaic are represented by an array
of four columns (called**B**`boxes`in the program). Columns 1 and 2 give the*(x , y)*location of the lower left corner of the tile; columns 3 and 4 give the horizontal and vertical lengths of the tile. At step 0,*B = { 0 0 100 100 }*. There is one row for each tile. The following steps are repeated for each variable,*s = 1 , ... , n*: - For variable
*s*find the marginal frequencies of variables*1, ... , s*, a vector of length*L*, with the levels of variable_{s}*s*varying most rapidly. - Reshape this vector row-wise to a matrix
of**M**= { m_{ gh}}*L*rows and_{ s - 1}*l*columns. (The array_{ s}is called**M**`margin`in the program. See the arrays labelled "Marginal totals" in Figure 1.) The rows ofcorrespond to the tiles of the previous variables at step**M***s - 1*. - Each old tile is then divided vertically (if
*s*is odd) or horizontally (*s*even) into*l sub s*tiles, with the width (*s*odd) or height (*s*even) of each tile proportional to*m*._{ gh}/ m_{ g+}

- At any stage the division of the tiles for the current variable
is in proportion to the entries in each row of
divided by the row totals.**M** - We can draw the tiles representing the marginal frequencies at any stage, not just the final stage as Hartigan and Kleiner do.
- Fitting the model of joint independence of the current variable
with all previous variables jointly is equivalent to testing
independence of the rows and columns of the matrix
. For example, for a three-way table, the expected frequencies under the model**M***[AB] [C]*can be expressed in terms of the*I J x K*matrixas**M***m*._{ (ij)+}m_{ +k}/ m_{ ++}

This spacing of the tiles is accomplished by constructing an
unspaced mosaic in a reduced area (determined by the `space`
parameter), then expanding to include the necessary spacing.

+-------------------------------------------------------------------+ | | | mosaic *-- check inputs, assign default values; | | | | | |-- divide *-- fit models and draw the mosaic display; | | | | | |--reduce *-- find reduced model for factors 1:f; | | | | | |--mfit *-- fits a specified model; | | | | | |--chisq *-- calculate chisquares; | | | | | |--df *-- calculate degrees of freedom; | | | |--terms *-- find all terms in a loglinear model; | | | |--vars_in *-- find variables in a term; | | | | | |--modname *-- expand config into string for model label; | | | | | |--divide1 *-- divide the mosaic for the next variable; | | | | | |--space *-- space the tiles in the current display; | | | | | |--labels *-- calculate label placements; | | | | | |--gboxes *-- draw the current display; | | |--fillbox *-- custom shading; | | |--glegend *-- draw legend; | | | | readtab *-- read input frequencies, level names; | | |--readlab *-- read level names, reorder input | | | | reorder *-- reorder the dimensions of an n-way table; | +-------------------------------------------------------------------+Figure 5: Calling structure of the modules in MOSAICS.SAS

The top-level module, `mosaic` simply validates the input
parameters, assigns default values for global variables, and calls
the module `divide`. The steps in the algorithm described
above are carried out by `divide`; the calculation of the new
tiles in step 5 is performed in `divide1`.

- Fixed conflict between the global variable
`DEVTYPE`and the macro variable used for graphics device control. - Changed circle blanking used for
`CELLFILL`to white/black text, depending on shading density. - Added control of threshold for
`CELLFILL`. You can now say`CELLFILL = DEV 1.0`and all absolute residuals > 1.0 will have their values written in the tiles. - Added calculation and display of adjusted residuals
(
*= d / \sqrt(1-h)*) - The default font now depends on device driver, making it easier to get PS/EPS output in Windoze.
- Added NAME global variable for graph names in the graphics catalog.
- Fixed a bug in the calculation of adjusted residuals
- Added CELLFILL='FREQ' to display cell frequency in the tiles.
- Added ABBREV global to abbreviate variable names in models and titles.

- Added
`vlabels`global variable to control the number of variables for which variable names are used in the display,`fuzz`now sets line style solid. - Global variables are now set in a separate module to make changing defaults easier.
- In
`reorder`module, you can now specify the variable names in the new order, rather than indices. The`config`configuration may also be specified using variable names. - Added code for models of joint independence and conditional independence in which any variable may be specified at the jointly indpendent or conditioning one.

- Added a GSKIP module, for EPS ouput to separately named
graphics files. Requires a global macro variable,
`&DEVTYPE = EPS`

- Added
`zeros=`global input matrix to handle structural zeros. - Added ability to display chisquare value in the mosaic title for each plot, by using '&G2' in the title string.
- Changed default values to filltype={HLS HLS}, colors={BLUE RED} since this is what I always use now, except for monochrome output.

- Added
`readtab`routine for easier input from a SAS dataset. - Added
`devtype='FT'`to calculate and display Freeman-Tukey residuals. - Character values of global input variables no longer need be entered in upper case.

- Added ability to fit a sequence of Markov models (
`fittype='MARKOV';`) for lag sequential data. - Fit the equiprobability model for the display of the first variable.

- Installation simplified by creating a separate file, MOSAICM.SAS, to install IML modules.
- Filltypes changed to allow separate coding for postitive and negative residuals, and to provide grayscale shading levels.
- Added ability (
`cellfill`) to print a symbol in the cell symbolizing the value of the residual.

- Friendly, M. (1991). Mosaic displays for multi-way contingency tables. York Univ.: Dept. of Psychology Reports, 1991, No. 195.
- Friendly, M. (1992). Mosaic displays for loglinear models. Proceedings of the Statistical Graphics Section, American Statistical Association, 61-68.
- Friendly, M. (1994). Mosaic displays for
multi-way contingency tables. Journal of the American
Statistial Association,
*89*, 190-200. - Friendly, M. (1998). Extending Mosaic Displays: Marginal, Partial, and Conditional Views of Categorical Data: Paper presented at the Workshop on ``Data Visualization in Statistics'', July 6-10, 1998, held at Drew University. [Published in JCGS, 1999, 8:373--395.] }.
- Hartigan, J. A., and Kleiner, B. (1981), Mosaics for contingency tables. In W. F. Eddy (Ed.), Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface, 268-273. New York: Springer-Verlag.
- Wang, C. M. (1985). Applications and computing of mosaics. Computational Statistics & Data Analysis, 3, 89-97.