Chi Squared Goodness of Fit
Introduction
The Chi-Squared (X2) Goodness of Fit test is used to compare an observed sample distribution with the expected probability distribution. It is used in situations when there is a single population. There is only a hypothesis test (no confidence interval) for Chi-Squared, and the test is always right sided.
X2 Distribution
- It is a continuous density curve.
- There is one parameter (degrees of freedom).
- All X2 values are greater than zero.
- For small degrees of freedom (df < 20), the X2 distribution is skewed right.
Example
A state university claims the following about its student body:
Instate | Out of state | International |
81% | 11% | 8% |
Using a simple random sample, you gather your own data:
Instate | Out of state | International |
125 | 34 | 41 |
(Total = 200)
Based on the data, is there enough evidence to disprove the university's claims?
Step 1: Name Test: Chi-Squared Godness of Fit
Step 2: Define Test:
Ho: The actual population proportions are equal to the stated proportions.
HA: The actual population proportions differ from the stated proportions.
Step 3: Assume \(H_0\) is true and define its normal distribution. Then check for specific conditions which vary depending on the type of hypothesis test.
1. Data is draw from a random sample
2. N > 10n
3. All expected counts are at least 5
Step 4: Using the normal distribution, calculate the test statistic and p-value.
Instate | Out of state | International | |
University Claim | 81% | 11% | 8% |
Data from Sample | 125 | 34 | 41 |
Expected Count |
162 | 22 | 16 |
Solving the test statistic by hand:
X2 = \((125 - 162)^2 \over 162 \) + \((34 - 22)^2 \over 22 \) + \((41 - 16)^2 \over 16 \) = 54.059
Solving the test statistic with calculator:
L1 = observed, L2 = expected, L3 = (L1 - L2)2 / L2
Put L3 value into 1 Var Stats ⇒ X2 = ΣX
Finding the p-value
In your calculator, go to 2nd VARS and scroll to the X2cdf option. You will enter the following:
- The X2 value as a lower limit
- 999 as the upper limit
- the degrees of freedom
- Note: the degrees of freedom = the number of cells \(-\) 1
- All together, it looks like this: (X2, 999, df) ⇒ p-value
In this case, you will enter (54.059, 999, 2) and get a p-value of 1.82 \(\times\) 10-12
Step 5: Analyze your results and determine if they are statistically significant.
The p-value is approximately zero, which is less than the significance level of 0.05. Therefore, we reject the null hypothesis. The data supports the claim that the residency distribution differs from the stated resicency distribution on the website.