Recently I needed to compare two proportions
using R. I tried with the prop.test function, but I noticed that it performs a Chi square test. Unfortunately,
it seems that there is not a function in R for two-proportion z-test and I had
to write it on my own.
I found
only a post in R
bloggers , in which the function allows to calculate the z-score. Hence,
also using in part a function for one-proportion
z-test, I tried to improve it for
getting the estimate of the two proportions, the p-value and the confidence
interval (95%) for the difference.
Given
that:
where:
p1= proportion of subjects with the characteristic of interest in the 1st group (x1/n1)
p2= proportion of subjects with the characteristic of interest in the 2nd group (x2/n2)
p1= proportion of subjects with the characteristic of interest in the 1st group (x1/n1)
p2= proportion of subjects with the characteristic of interest in the 2nd group (x2/n2)
and:
Confidence
interval of the difference:
where SE of
the difference:
The
function becomes:
z.prop =
function(x1,x2,n1,n2,conf.level=0.95,alternative="two.sided"){
numerator =
(x1/n1) - (x2/n2)
p.common =
(x1+x2) / (n1+n2)
denominator =
sqrt(p.common * (1-p.common) * (1/n1 + 1/n2))
z.prop.ris =
numerator / denominator
p1 <- x1/n1
p2 <- x2/n2
p.val <-
2*pnorm(-abs(z.prop.ris))
if(alternative=="lower") {
p.val <- pnorm(z.prop.ris)
}
if(alternative=="greater") {
p.val <- 1 - (pnorm(z.prop.ris))
}
SE.diff <-
sqrt(((p1*(1-p1)/n1)+(p2*(1-p2)/n2)))
diff.prop <- (p1-p2)
CI <- (p1-p2)+ c(
-1*((qnorm(((1 -
conf.level)/2) + conf.level))*SE.diff),
((qnorm(((1 - conf.level)/2) + conf.level))*SE.diff) )
return
(list(estimate=c(p1,p2),ts.z=z.prop.ris,p.val=p.val,diff.prop=diff.prop,CI=CI))
}
It
gives you back a two-tailed p-value when the alternative is not specified.
For example, let’s say that you want to compare
the proportion of Helicobacter
infection in two groups of individuals: the first is composed by people with
gastritis and the second one by healthy individuals.
Group
1 (gastritis): n1=500
Group
2 (healthy): n2=470
Group
1 with Helicobacter infection: x1=350
Group2
with Helicobacter infection: x2=295
It will be:
z.prop(350, 295, 500, 470) #two-sided
This
is what you get back:
$estimate
[1] 0.7000000 0.6276596
$ts.z
[1] 2.385499
$p.val
[1] 0.01705595
$diff.prop
[1] 0.07234043
$CI
[1] 0.0129810 0.1316999
You can
also specify the alternative (“lower” or “greater”) for performing one-tailed test.
Let me know
if it works for you or how to improve it.
No comments:
Post a Comment