Monday, 18 December 2017

LITTLE TROUBLESHOOTING IN R

Using R, little troubles can arise. Below, a "work in progress" collection of solved problems...or not yet...


If you have your own solution or wish to discuss some topics, your contribution will be welcome.


  • ERRONEOUSLY SAVED WORKSPACE


If you want to delete the workspace you saved, check into the Documents folder and delete both files that appear as "R Workspace" and "Rhistory". That's all!!







  • FISHER'S EXACT TEST ERROR MESSAGES: "FEXACT error 6 and 7"

Have you ever got error messages performing Fisher's exact test? Well, I have got two different error messages that I solved with the same "tricks":

  Error in fisher.test(.Table) : FEXACT error 7.
  LDSTP is too small for this problem.
  Try increasing the size of the workspace.

OR

  Error in fisher.test(.Table) : FEXACT error 6.
  LDKEY is too small for this problem.
  Try increasing the size of the workspace.


As suggested, you can first try to increase the workspace to 2e8 or 2e9: 

  fisher.test(data,workspace=2e8)  #change data with the name of your table


If it still doesn't work, you can go for a simulation:

  fisher.test(data,simulate.p.value=TRUE,B=1e7)

where simulate.p.value is a logical for computing p-value by Monte Carlo simulation and B indicates the number of replicates (by default B=2000), that can be changed as needed.
I know, it is not exactly the same...but at least you can run the analysis.

As usual, if you have a better solution, please let me know!




  • NEW VERSION NOT WRITABLE: " '/R-X.X.X/library' is not writable"

Have you just updated the R version but can't update the library because the folder is not writable?

Don't worry! You have two different options:

1. the easiest way is to create a private library. Just say "Yes" when R will ask you (and take note of the location of the folder);

2. if you still want to write in the library folder, you can get the authorisation following the easy instructions I provided in this post.




  • ANOVA ERROR MESSAGES: "Error in eigen" OR "Error in contrasts"

If you perform a mixed ANOVA, either using
aov() or anova_test() (from rstatix package), you could run into these error messages:

  Error in eigen(SSPE, only.values = TRUE) : 
  infinite or missing values in 'x'

OR

  Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels


In both cases, the problem is represented by missing values (NA).

If you prepare your data excluding NA for your dependent variable with filter(), you will get the first message because there will be no values corresponding to that case in that particular level of the independent variable.

If you decide to solve the problem by eliminating the row with the NA from the dataset, you will get the second error message because R will miss completely that case in that particular level of the independent variable.

To avoid this problem you should remove the rows with that specific case at any level of the independent variable, including those where the value is not missing.

Of course, this can represent a loss of data.

One easy possibility is to substitute the NA with the mean of the group at that level of the independent variable.

However, to decide how to handle missing values, I suggest going on with further readings.





Monday, 19 June 2017

TWO-PROPORTION Z-TEST IN R



Recently I needed to compare two proportions using R. I tried with the prop.test  function, but I noticed that it performs a Chi square test. Unfortunately, it seems that there is not a function in R for two-proportion z-test and I had to write it on my own.

I  found only a post in R bloggers , in which the function allows to calculate the z-score. Hence, also using in part a function for one-proportion z-test,  I tried to improve it for getting the estimate of the two proportions, the p-value and the confidence interval (95%) for the difference.



Given that:



where:

      p1= proportion of subjects with the characteristic of interest in the 1st group (x1/n1)
      p2= proportion of subjects with the characteristic of interest in the 2nd group (x2/n2)


and:

Confidence interval of the difference:


where SE of the difference:



The function becomes:

z.prop = function(x1,x2,n1,n2,conf.level=0.95,alternative="two.sided"){
  numerator = (x1/n1) - (x2/n2)
  p.common = (x1+x2) / (n1+n2)
  denominator = sqrt(p.common * (1-p.common) * (1/n1 + 1/n2))
  z.prop.ris = numerator / denominator
  p1 <- x1/n1
  p2 <- x2/n2
  p.val <- 2*pnorm(-abs(z.prop.ris))
  if(alternative=="lower") {
            p.val <- pnorm(z.prop.ris)
         }
         if(alternative=="greater") {
            p.val <- 1 - (pnorm(z.prop.ris))
         }
  SE.diff <- sqrt(((p1*(1-p1)/n1)+(p2*(1-p2)/n2)))
  diff.prop <- (p1-p2)
  CI <- (p1-p2)+ c(
          -1*((qnorm(((1 - conf.level)/2) + conf.level))*SE.diff),
         ((qnorm(((1 - conf.level)/2) + conf.level))*SE.diff) )

  return (list(estimate=c(p1,p2),ts.z=z.prop.ris,p.val=p.val,diff.prop=diff.prop,CI=CI))

}



It gives you back a two-tailed p-value when the alternative is not specified.

For example, let’s say that you want to compare the proportion of Helicobacter infection in two groups of individuals: the first is composed by people with gastritis and the second one by healthy individuals.

Group 1 (gastritis): n1=500
Group 2 (healthy): n2=470

Group 1 with Helicobacter infection: x1=350
Group2 with Helicobacter infection: x2=295

It will be:

z.prop(350, 295, 500, 470)   #two-sided


This is what you get back:

$estimate
[1] 0.7000000 0.6276596

$ts.z
[1] 2.385499

$p.val
[1] 0.01705595

$diff.prop
[1] 0.07234043

$CI
[1] 0.0129810 0.1316999


You can also specify the alternative (“lower” or “greater”) for performing  one-tailed test.
Let me know if it works for you or how to improve it.