Recently I needed to compare two proportions
using R. I tried with the prop.test  function, but I noticed that it performs a Chi square test. Unfortunately,
it seems that there is not a function in R for two-proportion z-test and I had
to write it on my own.
I  found
only a post in R
bloggers , in which the function allows to calculate the z-score. Hence,
also using in part a function for one-proportion
z-test,  I tried to improve it for
getting the estimate of the two proportions, the p-value and the confidence
interval (95%) for the difference.
Given
that:
where:
p1= proportion of subjects with the characteristic of interest in the 1st group (x1/n1)
p2= proportion of subjects with the characteristic of interest in the 2nd group (x2/n2)
p1= proportion of subjects with the characteristic of interest in the 1st group (x1/n1)
p2= proportion of subjects with the characteristic of interest in the 2nd group (x2/n2)
and:
Confidence
interval of the difference:
where SE of
the difference:
The
function becomes:
z.prop =
function(x1,x2,n1,n2,conf.level=0.95,alternative="two.sided"){
  numerator =
(x1/n1) - (x2/n2)
  p.common =
(x1+x2) / (n1+n2)
  denominator =
sqrt(p.common * (1-p.common) * (1/n1 + 1/n2))
  z.prop.ris =
numerator / denominator
  p1 <- x1/n1
  p2 <- x2/n2
  p.val <-
2*pnorm(-abs(z.prop.ris))
 
if(alternative=="lower") {
           
p.val <- pnorm(z.prop.ris)
         }
        
if(alternative=="greater") {
           
p.val <- 1 - (pnorm(z.prop.ris))
         }
  SE.diff <-
sqrt(((p1*(1-p1)/n1)+(p2*(1-p2)/n2)))
  diff.prop <- (p1-p2)
  CI <- (p1-p2)+ c(
          -1*((qnorm(((1 -
conf.level)/2) + conf.level))*SE.diff),
        
((qnorm(((1 - conf.level)/2) + conf.level))*SE.diff) )
  return
(list(estimate=c(p1,p2),ts.z=z.prop.ris,p.val=p.val,diff.prop=diff.prop,CI=CI))
}
It
gives you back a two-tailed p-value when the alternative is not specified.
For example, let’s say that you want to compare
the proportion of Helicobacter
infection in two groups of individuals: the first is composed by people with
gastritis and the second one by healthy individuals.
Group
1 (gastritis): n1=500
Group
2 (healthy): n2=470
Group
1 with Helicobacter infection: x1=350
Group2
with Helicobacter infection: x2=295
It will be:
z.prop(350, 295, 500, 470)   #two-sided
This
is what you get back:
$estimate
[1] 0.7000000 0.6276596
$ts.z
[1] 2.385499
$p.val
[1] 0.01705595
$diff.prop
[1] 0.07234043
$CI
[1] 0.0129810 0.1316999
You can
also specify the alternative (“lower” or “greater”) for performing  one-tailed test.
Let me know
if it works for you or how to improve it.
