Monday, 19 June 2017

TWO-PROPORTION Z-TEST IN R



Recently I needed to compare two proportions using R. I tried with the prop.test  function, but I noticed that it performs a Chi square test. Unfortunately, it seems that there is not a function in R for two-proportion z-test and I had to write it on my own.

I  found only a post in R bloggers , in which the function allows to calculate the z-score. Hence, also using in part a function for one-proportion z-test,  I tried to improve it for getting the estimate of the two proportions, the p-value and the confidence interval (95%) for the difference.



Given that:



where:

      p1= proportion of subjects with the characteristic of interest in the 1st group (x1/n1)
      p2= proportion of subjects with the characteristic of interest in the 2nd group (x2/n2)


and:

Confidence interval of the difference:


where SE of the difference:



The function becomes:

z.prop = function(x1,x2,n1,n2,conf.level=0.95,alternative="two.sided"){
  numerator = (x1/n1) - (x2/n2)
  p.common = (x1+x2) / (n1+n2)
  denominator = sqrt(p.common * (1-p.common) * (1/n1 + 1/n2))
  z.prop.ris = numerator / denominator
  p1 <- x1/n1
  p2 <- x2/n2
  p.val <- 2*pnorm(-abs(z.prop.ris))
  if(alternative=="lower") {
            p.val <- pnorm(z.prop.ris)
         }
         if(alternative=="greater") {
            p.val <- 1 - (pnorm(z.prop.ris))
         }
  SE.diff <- sqrt(((p1*(1-p1)/n1)+(p2*(1-p2)/n2)))
  diff.prop <- (p1-p2)
  CI <- (p1-p2)+ c(
          -1*((qnorm(((1 - conf.level)/2) + conf.level))*SE.diff),
         ((qnorm(((1 - conf.level)/2) + conf.level))*SE.diff) )

  return (list(estimate=c(p1,p2),ts.z=z.prop.ris,p.val=p.val,diff.prop=diff.prop,CI=CI))

}



It gives you back a two-tailed p-value when the alternative is not specified.

For example, let’s say that you want to compare the proportion of Helicobacter infection in two groups of individuals: the first is composed by people with gastritis and the second one by healthy individuals.

Group 1 (gastritis): n1=500
Group 2 (healthy): n2=470

Group 1 with Helicobacter infection: x1=350
Group2 with Helicobacter infection: x2=295

It will be:

z.prop(350, 295, 500, 470)   #two-sided


This is what you get back:

$estimate
[1] 0.7000000 0.6276596

$ts.z
[1] 2.385499

$p.val
[1] 0.01705595

$diff.prop
[1] 0.07234043

$CI
[1] 0.0129810 0.1316999


You can also specify the alternative (“lower” or “greater”) for performing  one-tailed test.
Let me know if it works for you or how to improve it.



No comments:

Post a Comment