Hypothesis Testing: Two Proportions (aka Difference of Proportions)
Introduction
Two Proportions hypothesis tests are used when...
- You are comparing two different populations
- You have TWO proportions from TWO INDEPENDENT random samples
For example, as a researcher, you might want to know if there is a difference in the proportion of males who use Facebook and the proportion of females who use Facebook. A quality control specialist might also want to know if there is a difference in the percentage of defective items produced by two different machines.
A few symbols need to be defined before we dive in:
- ^p1 and ^p2 refer to the sample proportions that you will use to disprove the null.
- ^pc refers to the combined proportion (formula down below ↓ )
- ^qc refers to 1 minus the combined proportion, i.e. 1−^pc
- n1 and n2 refer to the sample sizes.
Example
A columnist claims that women are more safety-conscious than men when it comes to driving. A recent survey on use of seatbelts was done among a random sample of 150 men and 250 women. Based on the results, 105 men said they always wear seatbelts when driving and 186 women said the same. Using a 0.05 level of significance, do the results of the survey support the columnist’s claim?
Step 1: Name Test: 2-Proportions / Difference of Proportions
Step 2: Define Test:
With this null hypothesis, the options for the alternative hypothesis are as follows:
Left-Sided Test | Two-Sided Test | Right-Sided Test |
H0:p1=p2 HA:p1<p2 |
H0:p1=p2 HA:p1≠p2 |
H0:p1=p2 HA:p1>p2 |
In this case, let's call the proportion of men who wear seatbelts pM and the proportion of women who wear seatbelts pW. If the alternative hypothesis is that women are more safety-conscious than men, then women should have a higher seatbelt usage and pW>pM.
H0:pW=pM
HA:pW>pM
Step 3: Assume H0 is true and define its normal distribution. Then check the conditions.
1. The data is drawn from TWO independent random samples.
2a. From Sample 1: N1≥10n1
2b. From Sample 2: N2≥10n2
3a. From Sample 1: n1^p1≥10 and n1^q1≥10
3b. From Sample 2: n2^p2≥10 and n2^q2≥10
Step 4: Using the normal distribution, calculate the test statistics and p-value.
Although the full formula is z=(^p1−^p2)−(p1−p2)√^pc^qcn1+^pc^qcn2 , it can be simplified. Recall that the null is H0:p1=p2. Thus, p1−p2=0 . This leaves us with the formula below:
Now, let's consider how to calculate the combined proportion ^pc. Recall that the proportion ˆp of a sample having a certain attribute is given by ˆp=xn , where x is the number of elements in the sample possessing that certain attribute and n is the sample size. Thus, the combined proportion ^pc is calculated as follows:
Test Statistic:
^pW=186250=0.744 and ^pM=105150=0.70
^pc=186+105250+150=0.7275
→ z=(0.744−0.70)√(0.7275)(0.2725)250+(0.7275)(0.2725)150 → z=1.045
P-Value:
The p-value will be found by using the normal cdf function on your calculator:
- lower limit: z
- upper limit: 999
- distribution center: 0
- standard deviation: 1
- All together, it looks like this: normalcdf (z, 999, 0, 1)
*Note: If it was a left-sided test and the test statistic was negative (z < 0), then your lower limit would be -999 and your upper limit would be the test statistic (z).
In this case, we do normalcdf (1.045, 999, 0, 1) to get a p-value of 0.15.
Step 5: Analyze your results and determine if they are statistically significant.
We calculated a p-value of 0.15. This p-value is greater than the significance level of 0.05. Therefore, we FAIL to reject the null hypothesis. The data does NOT support the columnist’s claim that there is a difference between the proportion of men and women who always use seatbelt when driving.