Summary : Skewness and Kurtosis

Mohammed Anwar Sayid (sayid@signal.dra.hmg.gb)
Mon, 22 Jan 96 08:21:47 +0000


Dear All,

Last week I sent a request about skewness and
kurtosis. The response was tremendous and this is a
posting of all the responses.

A big thank you to everyone who replied!

Best Regards, Anwar

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>REPLY 1

> 1. How do you remove the skewness and kurtosis?

There is no general rule how to do it. Usually people try to
log-transform their data and it often helps.

> 2. What does it mean to remove the skewness and kurtosis?

It means to transform your data in such manner that they follow the
normal distribution (and you are allowed to use some test, such as
students t-test)

> 3. Is it sensible to remove the skewness and kurtosis?

It is only used if you want to apply some test which require
normally distributed data.

> 4. How does the underlying probability distribution change?

It changes from whatever to normal, if you succeed. The terms
skewness and kurtosis are related to normal distribution and denote
two possible ways how an experimental distribution may deviate from
the normal distribution.

> 5. What is a "small" or "OK" value for skewness and kurtosis?

These are non-statistical terms and have no meaning. They probably
should express, that ones distribution depart from expected values
only a little. I fact you should use a test to assess a significance
level of the deviation you observe in your data.

> Numerical Recipes (Ch.14 Statistical Description of Data)
> has a nice discussion about the meaning of the mean, variance,
> skewness and kurtosis but is lukewarm about using the
> skewness and kurtosis. Kendall (Vol.1) doesn't go into much
> detail and basically states the facts. Also the skewness
> is defined differently in Numerical Recipes than in Kendall,
> which talks about a measure defined by Pearson.
>

skewness: b1=sum(xi-mean)^3/(N-1)*s^3 for ND b1=0

tested by normal deviate z = sqrt(b1(N+1)(N+3)/6(N-2))

kurtosis: b2=sum(xi-mean)^4/(N-1)*s^4 for ND b2=3

tested by normal deviate z =
(b2-3+6/(N+1))sqrt((N+1)(N+1)(N+3)(N+5)/24N(N-2)(N-3))

these formulas are widely used for a long time. In fact I never saw
other way of calculating skewness and kurtosis than the momentum
stuff.

Cheers,

Michal

- ------------------------------------------------------------------------
Michal Kucera
University of Goteborg tel: +46.31.773 44 70
Department of Marine Geology fax: +46.31.773 49 03
Earth Sciences Centre e-mail: michal@gvc.gu.se
S-413 81 Goteborg, Sweden program used: Pegasus Mail

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>REPLY 2

>Given a sample x(1),...,x(n)

1. How do you remove the skewness and kurtosis?
2. What does it mean to remove the skewness and kurtosis?
3. Is it sensible to remove the skewness and kurtosis?
4. How does the underlying probability distribution change?

<

You might find the book, Visualizing Data, by William S. Cleveland a
very useful place to start. Published in 1993 by Hobart Press, it has an
ISBN number of 0-9634884-0-6. Contact me directly if you need the telephone
number for Hobart Press.

Steve
aulenbac@ncar.ucar.edu

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLY 3

Hello Anwar,

I found the book "Beyond Anova, Basics of Applied Statistics" by
Rupert G. Miller did a great job at answering your questions. The book
is published by John Wiley & Sons.

Good Luck,

Jon Binkley

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLY 4

I will be interested in any other responses you get and would appreciate if
you would post a summary.

I have experience doing k-means cluster analysis and factor analysis/PCA
with very skewed data. In my experience, k-means is much more sensitive to
scaling than FA/PCA. I use two different ways of scaling to remove
skewness: (1) use deciles rather than raw data and (2) take logs (my data
are counts starting from 0 with a long right tail). I seem to get the best
results (most interpretable and face valid) with deciles with cluster
analysis. With FA/PCA, raw data and logs are similar and give the best
results. There's a paper on different standardizations that includes a
simulation study, but they only look at symmetric data:

Milligan, G. "A study of standardization of variables in cluster analysis"
Journal of Classification, 5:181-204 (1988). He gives a reference to a
paper in the Australian computer journal on using logs.

Dr. Edward C. Malthouse
Department of Marketing
Kellogg Graduate School of Business
Northwestern University
Evanston, IL 60208
Tele: 708-467-1213
email: ecm@casbah.acns.nwu.edu

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLY 5

I don't know what it means to remove skewness and kurtosis
from your data, unless it means to transform the data
in such a way that skewness and kurtosis have specified
values. This would be analogous to standardizing so that
the mean and variance are zero and one.
Skewness and kurtosis are chiefly used to diagnose how
normally distributed data are, since the standard normal
distribution has skewness zero and kurtosis 3 (that is,
if skewness and kurtosis are defined as the standardized
3rd and 4th moments, respectively). One way to do this
would be with Johnson's transformations Su and Sb (see
Johnson & Kotz, "Continuous Univariate Distributions", vol I.)
You would do this if you felt that it is necessary or
desirable to transform your data to normality. However, it
is not easy to do, and Johnson & Kotz don't actually tell you
how to do it. Since the normal assumption is quite robust,
this is not often considered necessary.
In case the data are far from normal, the skewness and
kurtosis values can be used to select a probability distribution
with the same 1st 4 moments from the Pearson system, which contains
most commonly used distributions. Again, see Johnson & Kotz, vol I.
Good luck,
Bob Byers
mathematical statistician
Div of HIV/AIDS
Centers for Disease Control
Atlanta, GA 30333
rhb1@cidhiv1.em.cdc.gov

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLY 6

You can't really remove skewness and kurtosis. You can transform
your data to reduce it...but even that doesn't always work. The
transformation is supposed to result in a bell-shaped distribution (normal).

If you are dealing with multiple groups, and all of the groups' distributions
are similar (i.e., similarly kurtotic or skewed), then ANOVA designs will
not be pathologically affected...you can still do them. See Kirk, 1994
3rd ed. (Experimental Design) for more info.

The value of an extreme measure of skewness or kurtosis depends on what program
you are using. For example, with SAS, anything above or below zero
indicates positive/negative skewness (similar for kurtosis). I think
S+ is the same...check the manual or help screen.

Kendall will not go into detail because it is a theoretical book, not
applied. If your data are multivariate (and even if not), check
Tabachnick and Fidell (1989) at the beginning. They have practical
measures of skewness/kurt.

But, above all, check the manual of S+ or whatever program to see
exactly how they compute it. Theoretically, of course, they are the
3rd and 4th central moments, but computer calculation might differ.

I indicated in my msg that for SAS, any value of skewness or kurtosis
greater or less than zero meant positive or negative skewness or
kurtosis. That is true, but the acceptable value might be within
+/- a number...depending on what guidelines you follow. Consult the
manual or Tabachnick & Fidell, from previous msg.

An approximation to the standard error for both and significance tests
are on page 72.

I believe I have used +/- 2 before.

From: LAURA THOMPSON <THOMPSONL@baylor.edu>

>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLY 7

>
>Dear All,
>
>Given a sample x(1),...,x(n)
>
>1. How do you remove the skewness and kurtosis?

In general you cannot remove the skewness or kurtosis unless you can find a
clever transformation that will transform the data so it has a normal
distribution. The normal distribution has all its cumulants equal to zero
except the first two. Skewness and kurtosis are simply the third and fourth
cumulants divided by the third and fourth power of the standard deviation.

>2. What does it mean to remove the skewness and kurtosis?

As discussed above, it means the third and fourth cumulants are made zero.

>3. Is it sensible to remove the skewness and kurtosis?

Usually, it means you can operate in a transform domain with normally
distributed data. (Assuming transformation made the higher order cumulants
zero). For example, the lognormal distribution has a shape parameter,
taking logs gives normally distributed data, which has no shape factor to
worry about.

>4. How does the underlying probability distribution change?

Reoving skewness and kurtiosis means the third and fourth cumulants are
made zero. If all other higher ordrer cumulants are zero the distribution
if Gaussian.

>
>
>Numerical Recipes (Ch.14 Statistical Description of Data)
>has a nice discussion about the meaning of the mean, variance,
>skewness and kurtosis but is lukewarm about using the
>skewness and kurtosis. Kendall (Vol.1) doesn't go into much
>detail and basically states the facts. Also the skewness
>is defined differently in Numerical Recipes than in Kendall,
>which talks about a measure defined by Pearson.
>

Kendall vol. 1 is a tour-de-force of the subject having extensive
discussion of the subject of cumulants which are the basis of skewness and
kurtosis. Use Kendall over Numerical Recipes. Be careful to distinguish
between the definition of skewness and kurtosis in terms of parameters of
the parent distribution or the theoretical moments of the parent
distribution and *estimates* of them from data. Explicit formulas for bias
corrected estimates are in Cramer (Mathematical Methods of Statistics)
p.386. Cramer uses the term "excess" instead of kurtosis. The material is
also in Kendall, but harder to find and less explicitly stated.

>Any advice and (discussion) references would be much appreciated.
>I will post a summary, if anyone is interested.
>

>
>Thanks in advance, Anwar
>
>
>5. What is a "small" or "OK" value for skewness and kurtosis?

Look at the discussion in Kendall on systems of distributions and material
on Edgeworth expansions. If by "OK" you mean how small so that I can treat
the data as normally distributed, that is a hard question. The Edgeworth
expansion is a bad approximation for the tails of the distribution, so
"small might not be small enough is you are concerned with far tail
behavior. See Kolassa "Series Approximation Methods in Statistics." You
might try simulation. A value for skewness and kurtosis implies values for
the third and fourth moments (given the mean and variance). Look at Devroye
"Non-Uniform Random Variate Generation" for ways to generate random
variables with specified moments and therefore specified skewness and
kurtosis. Then you might find out how small you need for your purposes.
Complicated? Yes. Life is not easy.

Michael Axelrod

>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLY 8

> Given a sample x(1),...,x(n)

> 1. How do you remove the skewness and kurtosis?
> 2. What does it mean to remove the skewness and kurtosis?
> 3. Is it sensible to remove the skewness and kurtosis?
>4. How does the underlying probability distribution change?

1. Most common way: choose a transform. Ex: most positive data
that is not approx. symmetric is usually positively skewed, and
the log transform is good. Be warned: sometimes a transform fixes
one problem but causes another.

2. Most often you want skewness=0 and kurtosis=3, so have approx.
normality. Simple tests for approx. normality calculate the skewness
and kurtosis. So, you "remove" skewness and hope for kurtosis approx.
equal to kurtosis of normal distn. This depends on your setting.
See (3) below.

3. Depends on setting. Suppose you want to do a t-test.
GOOD ref: Beyond ANOVA by Rupert Miller. (Wiley) See
for example page 6, where the appeal to the central limit theorem for large n
to rescue us in interpreting the t-statistic in the case of nonnormal data
reminds us that "how large is large" depends on
skewness and kurtosis. Message on page 6: skewness effects the
convergence to normality more than kurtosis does.
If simple normal probability plots don't look too bad (informal
checks for approx. normality) then no need to worry about t-statistic.
Also, modern "resample-the-data" methods such as the bootstrap
can be used to create confidence intervals WITHOUT worrying about
transforms to approx. normality.

4. If you transform x~f(x) to y=log(x)~g(y) then use standard
change-of-variable methods (1-dimensional jacobian here) to
either approx the distn of y or to get it exactly in simple cases.
Ref: nearly any first text on statistical inference, say
DeGroot: Prob. and Statistics (Addison-Wesley)

=======================================================================

_ _ _ _ _ _ _ _ _ _ _
| |
| | Tom Burr Los Alamos National Laboratory
| *LANL | Technical Staff Member NIS-7
| *SF | e-mail: tburr@lanl.gov
| | phone: 505-665-7865 or 667-7777
| *ABQ | fax: 505-667-7626
| |
| "The Land of |
| Enchantment" |
| |
| ________|
| ________|
|___|

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>