Module 11: Debugging R Functions

For this assignment, I worked on debugging a function that identifies outlier rows in a numeric matrix using the Tukey rule. The original code had a bug that I had to find and fix.

The Original Buggy Code

The function was supposed to flag rows where every column contains an outlier. Here's what I started with:

tukey.outlier = function(x) {

  Q1 = quantile(x, 0.25, na.rm = TRUE)

  Q3 = quantile(x, 0.75, na.rm = TRUE)

  IQR = Q3 - Q1

  lower = Q1 - 1.5 * IQR

  upper = Q3 + 1.5 * IQR

  return(x < lower | x > upper)

}


tukey_multiple = function(x) {

  outliers = array(TRUE, dim = dim(x))

  for (j in 1:ncol(x)) {

    outliers[, j] = outliers[, j] && tukey.outlier(x[, j])

  }

  outlier.vec = vector("logical", length = nrow(x))

  for (i in 1:nrow(x)) {

    outlier.vec[i] = all(outliers[i, ])

  }

  return(outlier.vec)

}


Error in outliers[, j] && tukey.outlier(x[, j]) : 

  'length = 10' in coercion to 'logical(1)'

 

What Went Wrong

The problem was on line 4 where I used && instead of &. The double ampersand && only works with single TRUE or FALSE values, but here I was comparing entire vectors of 10 elements. R couldn't handle that and threw an error.

The Fix

I changed && to & for element-wise comparison:

corrected_tukey = function(x) {

  outliers = array(TRUE, dim = dim(x))

  for (j in 1:ncol(x)) {

    outliers[, j] = outliers[, j] & tukey.outlier(x[, j])

  }

  outlier.vec = vector("logical", length = nrow(x))

  for (i in 1:nrow(x)) {

    outlier.vec[i] = all(outliers[i, ])

  }

  return(outlier.vec)

}

Testing the Fix

After fixing the bug, I tested it again:

set.seed(123)

 test_mat = matrix(rnorm(50), nrow = 10) 

corrected_tukey(test_mat)

corrected_tukey(test_mat)

 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

 [8] FALSE FALSE FALSE


The function now works without errors. The output shows that none of the 10 rows have outliers in every column, which makes sense for this random data.

What I Learned

This debugging exercise taught me the difference between logical operators in R. Use && and || for single values, and &and | for vectors. Reading error messages carefully really helps pinpoint where things go wrong.

Comments

Popular posts from this blog

R Programming Journal - Omar Hamad

Assignment 6

Generic Functions in R