Go to Quantitative Methods
Topics
Table of Contents
Introduction
- Parametric and non-parametric tests of correlation
- Tests of independence based on contingency table data
Tests Concerning Correlation
The strength of linear relationship between two variables is assessed through correlation coefficient. Significance is tested by using hypothesis tests concerning correlation.
The most common way of setting up hypotheses concerning correlation is to check if the population correlation is not equal to 0 →
We can also set up hypothesis to check if the population correlation is positive or negative.
Parametric Test of a Correlation
As long as the two variables are distributed normally, we can use sample correlation, r for our hypothesis testing.
The formula for the t-test is
where n – 2 = degrees of freedom if
The magnitude of r needed to reject the null hypothesis (
- Number of degrees of freedom increases and the absolute value of the critical value t decreases.
- Absolute value of the numerator increases, leading to larger-magnitude t-values.
In other words, as n increases, the probability of Type-II error decreases, all else equal.
The sample correlation between the oil prices and monthly returns of energy stocks in a Country A is 0.7986 for the period from January 2014 through December 2018. Can we reject a null hypothesis that the underlying or population correlation equals 0 at the 0.05 level of significance?
Solution:
At the 0.05 significance level, the critical level for this test statistic is 2.00 (n = 60, degrees of freedom = 58).
We can reject the null hypothesis.
Non-Parametric Test of Correlation: The Spearman Rank Correlation Coefficient
If the two variables under consideration are not normally distributed, we can use a test based on the Spearman rank correlation coefficient,
The Spearman rank correlation coefficient is equivalent to the usual correlation coefficient but is calculated on the ranks of two variables within their respective samples.
Steps
- Sort the X observations from largest to smallest.
- Assign the number 1 to the largest value observation, the number 2 to the second largest value observation, and so on.
- In the event of a tie, assign the average of the ranks that the tied observations share to each tied observation.
- Repeat the procedure for the observations on Y.
- Calculate the difference in ranks,
, for each pair of observations on X and Y, and then calculate (the squared difference in ranks).
For a sample size n, the spearman rank correlation is:
Perform a Spearman rank correlation test based on this sample data. Determine whether to reject the null hypothesis at the 0.05 level of significance if the critical values are ±2.306.
| | Alpha | Expense Ratio | Rank by X | Rank by Y |
| --- | ----- | ------------- | --------- | --------- |
| 1 | -0.52 | 1.34 | 6 | 6 |
| 2 | -0.13 | 0.4 | 1 | 9 |
| 3 | -0.5 | 1.9 | 5 | 1 |
| 4 | -1.01 | 1.5 | 9 | 2.5 |
| 5 | -0.26 | 1.35 | 3 | 5 |
| 6 | -0.89 | 0.5 | 8 | 8 |
| 7 | -0.42 | 1 | 4 | 7 |
| 8 | -0.23 | 1.5 | 2 | 2.5 |
| 9 | -0.6 | 1.45 | 7 | 4 |
Fail to reject the null hypothesis.
Tests of Independence Using Contingency Table Data
When dealing with categorical or discrete data presented in the form of a contingency table, we use a chi-squared distributed test statistic.
Suppose we want to test whether a relationship exists between the size and investment type, we can perform a test of independence using a chi-squared distributed test statistic.
This non parametric test compares actual observed frequencies with those expected on the basis of independence.
The test statistic is calculated as:
where:
- m = Number of cells in the table
= Observed frequency = Expected frequency =
This test statistic has degrees of freedom of (r − 1)(c − 1), where r is the number of categories for the first variable and c is the number of categories of the second variable.
Consider the following contingency table which classifies 1,594 ETFs based on two dimensions: size and investment type.
The 3 values in each cell are # of such companies, Expected Frequency, Scaled Squared Deviation.
| | Small | Medium | Large | Total |
| ------ | --------------------- | ----------------------- | ----------------------- | ----- |
| Value | 50
46.703
0.233 | 110
120.228
0.87 | 343
336.07
0.143 | 503 |
| Growth | 42
33.982
1.892 | 122
87.482
13.62 | 202
244.536
7.399 | 366 |
| Blend | 56
67.315
1.902 | 149
173.290
3.405 | 520
484.395
2.617 | 725 |
| Total | 148 | 381 | 1065 | 1594 |
Finally, we sum all the above values to get a chi-squared test statistic as 32.08025.
With a (3-1) x (3 -1) = 4 degrees of freedom and a one-sided test with a 5% level of significance, the critical value is 9.4877.
Since the calculated chi-squared test statistic (32.08025) is greater than 9.4877, we reject the null hypothesis of independence and conclude that ETF size and investment type are related.