Commentary on Linear Models and Effect Magnitudes Alan M Batterham Sportscience 14, 58-59, 2010
(sportsci.org/2010/amb.htm) |
This latest contribution to the pool of resources at Sportscience is an excellent learning, teaching, and research resource valuable for researchers and research consumers at all levels. In the form of a PowerPoint presentation, it may be used for upper level teaching (in whole or in parts) and also serves as a reference source for experienced researchers. The presentation builds on and complements the Magnitude Matters slideshow and the Progressive Statistics article published in January 2009 in MSSE. For me, the crux of the presentation is
on Slide 6, emphasizing the fact that
the right question is not whether
there is an effect but how big is
the effect. As Will highlights, answering this question requires an a priori
definition of the smallest worthwhile effect. This is by no means a trivial
task, but it is one that must not be shirked in hiding behind a null-hypothesis
testing framework. (As Will once remarked famously from the podium in an ACSM
symposium, "if you don't know what matters for your patients or clients,
quit the field!") An illustration of the importance of this problem is
the recent call by the UK Medical Research Council/ National Institute for
Health Research Methodology Research Programme for proposals concerned with
"how to specify the
targeted difference for a randomised controlled trial." Dr Jonathan Cook
(University of Aberdeen) is now leading this project, which will result in draft guidance for researchers
and funding bodies, including separate sections for different types of trials
and on different ways in which the outcomes of a treatment might be measured.
There are three main methods for arriving
at a minimum important difference; anchor-based methods, distribution-based
methods, and opinion seeking. Will notes in the presentation that clinicians
can’t agree on a value for the smallest worthwhile effect and that in the
absence of clinical consensus we need a statistical default. The approach
Will takes is therefore an example of a distribution-based method, in which changes in scores on an outcome are evaluated
in relation to the variability in scores for that outcome (e.g., thresholds
for the standardised mean difference). In anchor-based methods the aim is to
establish the change in the outcome being measured required to result in a
meaningful change on another measure which has already proven to be
clinically or practically important to the individual. For example, a
single-anchor method might involve assessing the change in maximum oxygen
uptake required for people to rate their health-related quality of life (the
anchor) as much improved. In my experience, robust anchor-based approaches are
rare in our field, and a statistical distribution-based default is sensible.
Moreover, some work has suggested a reconciliation of anchor-based and
distribution-based approaches, with a near-linear
relationship between effect size and the proportion of patients benefiting
from a treatment (Norman et al., 2001). My remaining comments relate to specific sections of the presentation. •
Slide 7 gives an example of
two predictors (Strength = a + b*Age + c*Size) with the statement that such
models allow us to work out the “pure” effect of each predictor: "That
is, yeah, kids get stronger as they get older, but is it just because they’re
bigger, or does something else happen with Age? The something else is
given by the 'b'. It’s that
simple!" I would add a caveat here to check for potentially degrading
collinearity. This is pertinent to the example given, as age and body size
may be highly related in growing and maturing children. Collinearity does not violate any of the assumptions of
ordinary least-squares regression and thus gives unbiased predictions from
the linear combination of predictors. However, if your goal is explanation
relating to the relative importance of
individual predictors, then collinearity could be a problem, as it may be
difficult to determine the separate influence of each. Collinearity results
in large standard errors for the affected coefficients (variance
inflation) and is essentially a data problem: insufficient data
information (signal) relative to the noise. Sophisticated collinearity
diagnostics are available in many statistical software packages, including
SAS and SPSS. •
On Slides 15 or 16 it would have been helpful to the reader if there
were a note or link to the source or derivation of the Hopkins scale of
effect magnitudes (for example, the progressive statistics paper), given that
it differs from Cohen’s scale and that this presentation may be the first
stop for some researchers. •
People often get very
confused about the difference between partial
and semi-partial correlations, and
which is better, so I found the plain-language explanations on Slide 21 very useful. •
It
crossed my mind when you were dealing with distributional issues,
non-uniformity, and transformations that bootstrapping should get a mention
somewhere. Bootstrapping provides trustworthy confidence limits when some of the assumptions underlying the
linear model are violated, including one you didn't mention directly,
independence of the observations. Published July 2010 |