In this example test scores (y) rise alongside anxiety (x) to a point, but then begin to fall as anxiety continues to rise. This scenario seems plausible enough - some anxiety is good and keeps us from getting complacent, but too much can inhibit our performance. What does this do to our correlation though? The calculated correlation for this non-linear sample is r(xy)=0.35, indicating a very weak correlation between test scores (y) and anxiety (x). This is when correlation can be misleading. Visual inspection of Fig. 2 shows a clear association; however, a linear measure of correlation does not detect it. Later posts will cover options for detecting non-linear associations between variables.
To conclude, when we talk about correlation, we are speaking of a number that represents the true association between variables. That number may be more or less accurate depending on whether or not we have used the appropriate measure of correlation for the data we have. If correlation is detected we know there is some association; however, we cannot assume that variables are independent of each other (i.e., not associated) just because no correlation is detected.
Key Ideas:
- Correlation measures the association between variables.
- Correlation is represented within a range of -1 to 1.
- Linear correlation is typically measured with Pearson's r.
- It cannot be assumed that variables are independent just because r=0.
- Pearson's r is insufficient for detecting non-linear correlation.