Introduction
A basic understanding of statistical concepts is necessary to effectively evaluate existing literature. Statistical results do not, however, allow one to determine the clinical applicability of published findings. Statistical results can be used to make inferences about the probability of an event among a given population. Careful interpretation by the clinician is required to determine the value of the data as it applies to an individual patient or group of patients.[1]
Good research studies will provide a clear, testable hypothesis, or prediction, about what they expect to find in the relationships being tested.[2] The hypothesis will be grounded in the empirical literature, based on clinical observations or expertise, and should be innovative in its tests of a novel relationship or confirmation of a prior study. There are at minimum two hypotheses in any study: (1) the null hypothesis assumes there is no difference or that there is no effect, and (2) the experimental or alternative hypothesis predicts an event or outcome will occur. Often the null hypothesis is not stated or is assumed. Hypotheses are tested by examining relationships between independent variables, or those thought to have some effect, and dependent variables, or those thought to be moved or affected by the independent variable. These also are called predictor and outcome variables, respectively.
Statistics are used to test a study’s alternative or experimental hypothesis. Statistical models are fitted based on the nature, type, and other characteristics of the dataset. Data typically involves levels of measurement, and these determine the type of statistical models that can be applied to test a hypothesis.[3] Nominal data are those variables containing two or more categories without underlying order or value. Examples of nominal data include indicators of group membership, such as male or female. Ordinal data is nominal data that includes an order or rank but has undefined spacing between groups or levels, such as faculty ranking, or educational level. Interval data is ordinal data with clearly defined spacing between the intervals and no absolute zero points. An example of interval data is the temperature scale, as the magnitude of the difference between intervals is consistent and measurable (one degree). Ratio data are interval data that include an absolute zero such as the amount of student loan debt. Nominal and ordinal data are categorical, where entities are divided into distinct groups, whereas, interval and ratio data are considered continuous such that each observation gets a distinct score.[4]
It is up to the researcher to appropriately apply statistical models when testing hypotheses. Several approaches can be used to analyze the same dataset, and how this is accomplished depends heavily on the nature of the wording in a researcher’s hypothesis.[5] There exist a variety of statistical software packages, some available for free while others charge annual license fees, that can be used to analyze data. Nearly all packages require the user to have a basic understanding of the types of data and appropriate application of statistical models for each type. More sophisticated packages require the user to use the program’s proprietary coding language to perform hypothesis tests. These can require a good amount of time to learn, and errors can easily slip past the untrained eye.
It is strongly recommended that unfamiliar users consult with a statistical analyst when designing and running statistical models. Biostatistician consultations can occur at any time during a study, but earlier consultations are wise to prevent the introduction of accidental bias into study data and to help ensure accuracy and collection methods that will be adequate to allow for tests of hypotheses.