Re: [S] Idle curiosity --> Thinking about the Bootstrap

Jens Oehlschlaegel (oehl@Psyres-Stuttgart.DE)
Tue, 17 Feb 1998 22:14:46 +0100 (MET)


Dear Charles,

Here's my announced idea: using the Bootstrap.

The statistic of interest is

seven.wins <- function(x,k){
tab <- tabulate(x,k)
o <- order(tab,runif(k)) # breaking ties
winner <- o[k]
winner==7
}

which returns TRUE if seven has highest frequency in one sample. The
following function

seven.test <- function(counts, B=1000){
k <- length(counts)
n <- sum(counts)
mat <- matrix(sample(1:k, B*n, T, counts/n), ncol=B)
resample.stat <- apply(mat,2,seven.wins,k)
table(resample.stat) / B
}

does bootstrap resampling (B replications), evaluates each bootstrap
sample
and returns the distribution of seven.wins().

The observed value of the statistic seven.wins() was
> seven.wins(rep(1:10,c(9, 18, 24, 17, 38, 30, 70, 12, 11, 8 )),10)
[1] T

Resampling from the original data gives
> seven.test(c(9, 18, 24, 17, 38, 30, 70, 12, 11, 8 ))
TRUE
1

which corresponds to a quite narrow confidence interval around TRUE.


However, aproximate equal p gives
> seven.test(c(23, 23, 23, 24, 24, 24, 24, 24, 24, 24 ))
FALSE TRUE
0.889 0.111

i.e. under the sharp null, seven is not rarely winner,
but we have ignored the amount of winning up to now.

We could use a more sensitive statistic seven.advantage()

seven.advantage <- function(x,k){
n <- length(x)
tab <- tabulate(x,k)
o <- order(tab,runif(k)) # breaking ties
winner <- o[k]
(tab[7]-ifelse(winner==7,tab[o][k-1],tab[o][k])) / n
}

which returns the difference between the proportion of the sevens to the
proportion of the next (or most) frequent number and

seven.test.2 <- function(counts, B=1000){
k <- length(counts)
n <- sum(counts)
mat <- matrix(sample(1:k, B*n, T, counts/n), ncol=B)
resample.stat <- apply(mat,2,seven.advantage,k)
resample.stat
}

which returns the (raw) resampled statistics.

The observed seven.advantage was
> seven.advantage(rep(1:10,c( 9, 18, 24, 17, 38, 30, 70, 12, 11, 8 )),10)
[1] 0.1350211

13.5% more frequent than the best competitor

Resampling from the original data gives
> temp <- seven.test.2(c( 9, 18, 24, 17, 38, 30, 70, 12, 11, 8 ))
> hist(temp)
> quantile(temp,c(0.025,0.5,0.975))
2.5% 50.0% 97.5%
0.05063291 0.1265823 0.2110759

where the confidence intervall is clearly above 0% seven.advantage.


By constrast equal prob gives
> temp <- seven.test.2(c( 23, 23, 23, 24, 24, 24, 24, 24, 24, 24 ))
> hist(temp)
> quantile(temp,c(0.025,0.5,0.975))
2.5% 50.0% 97.5%
-0.07605485 -0.02953586 0.01687764

rarely seven.advantages above 2%.

Of course this was only a non-bayesian talk about two alternatives.

Anything wrong with this?

Best regards

Jens Oehlschlaegel

--
Jens Oehlschlaegel-Akiyoshi
Psychologist/Statistician
Project TR-EAT + COST Action B6
                                                 F.rankfurt
oehl@psyres-stuttgart.de                         A.ttention
+49 711 6781-408 (phone)                         I.nventory
+49 711 6876902  (fax)                           R .-----.
                                                  / ----- \
Center for Psychotherapy Research                | | 0 0 | |
Christian-Belser-Strasse 79a                     | |  ?  | |
D-70597 Stuttgart Germany                         \ ----- /
-------------------------------------------------- '-----' -
(general disclaimer)                             it's better

----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news