Here's my announced idea: using the Bootstrap.
The statistic of interest is
seven.wins <- function(x,k){
tab <- tabulate(x,k)
o <- order(tab,runif(k)) # breaking ties
winner <- o[k]
winner==7
}
which returns TRUE if seven has highest frequency in one sample. The
following function
seven.test <- function(counts, B=1000){
k <- length(counts)
n <- sum(counts)
mat <- matrix(sample(1:k, B*n, T, counts/n), ncol=B)
resample.stat <- apply(mat,2,seven.wins,k)
table(resample.stat) / B
}
does bootstrap resampling (B replications), evaluates each bootstrap
sample
and returns the distribution of seven.wins().
The observed value of the statistic seven.wins() was
> seven.wins(rep(1:10,c(9, 18, 24, 17, 38, 30, 70, 12, 11, 8 )),10)
[1] T
Resampling from the original data gives
> seven.test(c(9, 18, 24, 17, 38, 30, 70, 12, 11, 8 ))
TRUE
1
which corresponds to a quite narrow confidence interval around TRUE.
However, aproximate equal p gives
> seven.test(c(23, 23, 23, 24, 24, 24, 24, 24, 24, 24 ))
FALSE TRUE
0.889 0.111
i.e. under the sharp null, seven is not rarely winner,
but we have ignored the amount of winning up to now.
We could use a more sensitive statistic seven.advantage()
seven.advantage <- function(x,k){
n <- length(x)
tab <- tabulate(x,k)
o <- order(tab,runif(k)) # breaking ties
winner <- o[k]
(tab[7]-ifelse(winner==7,tab[o][k-1],tab[o][k])) / n
}
which returns the difference between the proportion of the sevens to the
proportion of the next (or most) frequent number and
seven.test.2 <- function(counts, B=1000){
k <- length(counts)
n <- sum(counts)
mat <- matrix(sample(1:k, B*n, T, counts/n), ncol=B)
resample.stat <- apply(mat,2,seven.advantage,k)
resample.stat
}
which returns the (raw) resampled statistics.
The observed seven.advantage was
> seven.advantage(rep(1:10,c( 9, 18, 24, 17, 38, 30, 70, 12, 11, 8 )),10)
[1] 0.1350211
13.5% more frequent than the best competitor
Resampling from the original data gives
> temp <- seven.test.2(c( 9, 18, 24, 17, 38, 30, 70, 12, 11, 8 ))
> hist(temp)
> quantile(temp,c(0.025,0.5,0.975))
2.5% 50.0% 97.5%
0.05063291 0.1265823 0.2110759
where the confidence intervall is clearly above 0% seven.advantage.
By constrast equal prob gives
> temp <- seven.test.2(c( 23, 23, 23, 24, 24, 24, 24, 24, 24, 24 ))
> hist(temp)
> quantile(temp,c(0.025,0.5,0.975))
2.5% 50.0% 97.5%
-0.07605485 -0.02953586 0.01687764
rarely seven.advantages above 2%.
Of course this was only a non-bayesian talk about two alternatives.
Anything wrong with this?
Best regards
Jens Oehlschlaegel
-- Jens Oehlschlaegel-Akiyoshi Psychologist/Statistician Project TR-EAT + COST Action B6 F.rankfurt oehl@psyres-stuttgart.de A.ttention +49 711 6781-408 (phone) I.nventory +49 711 6876902 (fax) R .-----. / ----- \ Center for Psychotherapy Research | | 0 0 | | Christian-Belser-Strasse 79a | | ? | | D-70597 Stuttgart Germany \ ----- / -------------------------------------------------- '-----' - (general disclaimer) it's better
----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news