Necessary parts of conducting statistics
As I have been adding statistical assumptions to the website, it is safe to say that the "levee" has broke and I have A LOT more work to do to make this the best statistical and empirical website. With this being said, check out the new assumptions pages:
Independence of observations
Each participant in a sample can only be counted as one observation
As a biostatistician, I spend a lot of time testing for normality and homogeneity of variance.
Skewness and kurtosis statistics are used to assess the normality of a continuous variable's distribution. A skewness or kurtosis statistic above an absolute value of 2.0 is considered to be non-normal. Distributions are often non-normal due to outliers in the distribution. Any observation that falls more than 3.29 standard deviations away from the mean is considered an outlier.
Levene's Test of Equality of Variances is used to measure for meeting the assumption of homogeneity of variance. Any Levene's Test with a p-value below .05 means that the assumption has been violated. In the event that the assumption is violated, non-parametric tests can be employed.
There is one more important statistical assumption that exists coincident with the aforementioned two, the assumption of independence of observations. Simply stated, this assumption stipulates that study participants are independent of each other in the analysis. They are only counted once.
In between-subjects designs, each study participant is a mutually exclusive observation that is completely independent from all other participants in all other groups.
For within-subjects designs, each participant is independent of other participants. There are just multiple observations of the outcome, per participant.
With this being said, it is prevalent for researchers to take multiple measurements of an outcome and compare these multiple measurements in an independent fashion (oftentimes with differing numbers of observations across participants) or within-subjects (ALWAYS with differing numbers of observations of the outcome). By default, these are not independent measures and violate the assumption of independence of observations. What is one to do?
The answer is generalized estimating equations (GEE). This family of statistical tests are robust to multiple observations (or correlated observations) of an outcome and can be used for between-subjects, within-subjects, factorial, and multivariate analyses.
Meeting statistical assumptions is IMPORTANT
Statistics is a flawed mathematical science and assumptions MUST be met
I've read in the literature that somewhere between 30-90% of all statistics reported in the medical literature are incorrectly conducted. First of all, that's a WIDE range and either extreme should be pretty frightening to consumers of healthcare and other related services. If your practitioner is using evidence-based practices, then one would hope that your treatment regimen does NOT fall within that range!
Many times, statistics are incorrect because researchers do not check for the statistical assumptions associated with using their statistical tests. There are three fundamental statistical assumptions that all researchers should check before running any type of statistic:
1. Normality - If you are using ANY continuous variables, then use skewness and kurtosis statistics to assess their normality. Any variables that have a skewness or kurtosis statistics above an absolute value of 2.0 are assumed to be non-normal.
2. Homogeneity of variance - If you are using between-subjects analyses to compare independent groups on a continuous outcome, then use Levene's test to check for meeting the assumption of homogeneity of variance between your independent groups. This assumption assesses if the independent groups have similar variances associated with the outcome. If the p-value for Levene's test is LESS THAN .05, then the assumption has been violated.
3. "Missingness" - Missing data is a constant battle when conducting research. There are a litany of different reasons that lead to missing data but regardless, missing data can skew the results of a study by under-representation of the population of interest. If ANY of your variables have MORE THAN 20% of their observations missing, then that variable should be discarded.
Eric Heidel, Ph.D. is Owner and Operator of Scalë, LLC.
Professional Statistician For Hire!
Copyright © 2019 Scalë. All Rights Reserved. Patent Pending.