A New View of Statistics | |
SIMULATION FOR SAMPLE SIZE
Even if you can't make your program do the above steps automatically, it's worth making up some values and entering them by hand into a data set, then analyzing them. Make the effect you're interested in big to start with, so you can see which part of the output of the program corresponds to the thing you're looking for. Then try it with a small effect, and see if you still get significance. Remember that in the traditional approach, the smallest worthwhile effect is supposed to turn out significant 80% of the time.
Regard the rest of this page as an appendix. I describe how I
generate subjects and variables in SAS. The SAS language is a kind of
BASIC, so you should be able to follow it and adapt it to other
programs. I show a simulation for a cross-sectional study (where
validity can be an issue), and for a longitudinal study (where
reliability is crucial), and only for a numeric dependent
variable.
Cross-Sectional Study
Let's make two groups of 100 subjects differing by an
effect size of 0.2 for a variable with validity 0.9.
In SAS the function rannor(0) generates one randomly chosen value for a normally distributed variable with population mean of 0 and SD of 1. (The "0" has nothing to do with a mean of 0, by the way. It is just a starting "seed" number.) Your stats program should have something like rannor(0). Here I have assigned it to a variable called true (standing for a subject's true value)
true=rannor(0)
I usually stick with means of 0 and SDs of 1, but if you wanted to make it, say, 70 ± 6, you'd write true=70+6*rannor(0)
These few lines of code generate 100 subjects:
do subject=1 to 100;
true=rannor(0);
output;
end;
Now let's generate a variable called depvar (standing for dependent variable) with a validity of 0.9 (its correlation with true). I like to do it in such a way that depvar still has a mean of 0 and an SD of 1. I use rannor(0) again to generate a normally distributed source of error, then add a bit of it in with most of true. In the following, sqrt stands for square root:
do subject=1 to 100;
true=rannor(0);
depvar=0.9*true+sqrt(1-0.9**2)*rannor(0);
output;
end;
The fact that the population correlation of depvar with true is 0.9 follows from the definition of the correlation coefficient (for the geeks, the correlation coefficient = the covariance of the two variables, divided by their SDs). Here the covariance is 0.9, and the SDs are 1.
It's now dead easy to make another set of 100 subjects with a true effect size of 0.2 relative to the first 100. Study this closely, because it shows how a true effect of 0.2 is degraded to an observed effect of 0.9*0.2 when the validity is 0.9:
do subject=101 to 200;
true=rannor(0)+0.2;
var1=0.9*true+sqrt(1-0.9**2)*rannor(0);
output;
end;
Now do a t test and see if you get statistical significance. Write a program to do it 1000 times and see what percentage of the tests gives statistical significance, and hey presto, that's your power. It would be lousy with only 100 subjects in each group!
I'll leave it to you to work out how to generate a simulation for
a correlation between two variables, each with its own
less-than-perfect validity.
Longitudinal Study
The trick here is to generate two or more correlated
repeated measures. We'll do two and call them repvar1 and
repvar2. The correlation between the measures is the
reliability correlation, of course. Once again you generate true
values for your subjects, then add error, this time in a slightly
different way . Let's generate repvar1 and repvar2
with a reliability correlation of 0.95 for 20 subjects:
do subject=1 to 20;
true=rannor(0);
repvar1=sqrt(0.95)*true+sqrt(1-0.95)*rannor(0);
repvar2=sqrt(0.95)*true+sqrt(1-0.95)*rannor(0);
output;
end;
There are two ways to add in an effect, let's say 0.2 for repvar2. The normal way is to add it to the true value, just as we did for the cross-sectional design:
repvar2=sqrt(0.95)*(true+0.2)+sqrt(1-0.95)*rannor(0);
But sometimes repvar is the criterion outcome measure, so it may not be appropriate to consider that the effect is degraded by the less-than-perfect reliability. For example, if repvar represents competitive performance, we may be interested in detecting an effect of 0.2 for repvar, not for true. I'm still thinking about this one. In such cases, this is how you add in the effect:
repvar2=sqrt(0.95)*true+sqrt(1-0.95)*rannor(0)+0.2;
It's possible to add finite validity along with reliability for variables in a longitudinal simulation. If the reliability is r, the validity is v, and the effect size is es, then the following generates two variables (repvar1 and repvar2) that have a correlation of r with each other and that have a correlation of v with the true value:
do subject=1 to 20;
true=rannor(0);
errorv=rannor(0);
repvar1=v*true+sqrt(r-v**2)*errorv+sqrt(1-r)*rannor(0);
repvar1=v*(true+es)+sqrt(r-v**2)*errorv+sqrt(1-r)*rannor(0);
output;
end;
I have used this simulation to check that the
formulae for longitudinal designs
are correct.
Go to: Next · Previous · Contents · Search
· Home
resources=AT=sportsci.org · webmaster=AT=sportsci.org · Sportsci Homepage · Copyright
©1997
Last updated 16 June 97