sábado, 6 de agosto de 2016

Reading R Code. The function Reduce in R (part I)

[This post is the second one about the function Reduce. It assumes everything discussed in the first post about Reduce]


The function Reduce in R is based on the Lisp function reduce. It also adds an option borrowed from Haskell's scan functions.

Let's see this by exploring a few simple examples and observing their behavior in R as well as in Lisp and Haskell.

What does Reduce do?

## Example 1.
# R
> Reduce(`+`, 1:4)
[1] 10

# Lisp
[1]> (reduce #'+ '(1 2 3 4))
10

# Haskell
Prelude> foldl (+) 1 [2, 3, 4]
10

Look at the Haskell function. I have used foldl, that is the fold-left function in Haskell, because R and Lisp do fold-left by default. Note also that while in Haskell the initial value has to be specified, in R and in Lisp we don't need to do so. When not specified the initial value for left-folding is the first element in the given list.

The evaluation of all these functions (fold-left) expands as follows for the given example:

(((1 + 2) + 3) + 4)


## Example 2.
# R
> Reduce(`-`, 1:4)
[1] -8

# Lisp
[1]> (reduce #'- '(1 2 3 4))
-8

# Haskell
Prelude> foldl (-) 1 [2, 3, 4]
-8

The expansion is the same:

(((1 - 2) - 3) - 4)


## Example 3.
# R
> Reduce(`-`, 1:4, right = TRUE)
[1] -2

# Lisp
[1]> (reduce #'- '(1 2 3 4) :from-end t)
-2

# Haskell
Prelude> foldr (-) 4 [1, 2, 3]
-2

The evaluation expands to this now:

(1 - (2 - (3 - 4)))

This is fold-right. In R and Lisp we have to set an option for this fold, and at this point it should be clear why these two functions are described as "left" and "right" folds. Note also that, for fold-right the initial value, if given (as in Haskell) become the right-most item in the expansion. In fact, in R and Lisp if the initial value is not given, it is the last element of the given list.

R provides an extra feature borrowed from Haskell for producing a sequence of results of accumulated values rather than the final reduction.

In Haskell, there is a special set of functions for this purpose, scan functions, scanr corresponds to foldr and scanl to foldl:

Prelude> scanl (-) 1 [2, 3, 4]
[1,-1,-4,-8]

Observe the partial results in the computations above:

initial value:        =  1
1st result   :  1 - 2 = -1
2nd result   : -1 - 3 = -4
3rd result   : -4 - 4 = -8

Or, for a right fold:

Prelude> scanr (-) 4 [1, 2, 3]
[-2,3,-1,4]

where partial results from right to left and in reverse order are:

initial value:          =  4
1st result   : 3 - 4    = -1
2nd result   : 2 - (-1) =  3
3rd result   : 1 - 3    = -2

The R equivalents need the accumulate parameter set to TRUE:

Reduce("-", 1:4, accumulate = TRUE)
> [1]  1 -1 -4 -8

and

Reduce("-", 1:4, right = TRUE, accumulate = TRUE)
> [1] -2  3 -1  4

These functions are just variants of the tail-recursive pattern where a second accumulator is added to keep track of computed values so far.

So for instance scanl can be implemented as follows:

 (define (scan-left f i lox0)
  (local [(define (f-for-lox-acc acc rsf lox)
            (cond [(empty? lox) (reverse (cons acc rsf))]
                  [else
                    (f-for-lox-acc (f acc (first lox))
                                   (cons acc rsf)
                                   (rest lox))]))]
    (f-for-lox-acc i empty lox0)))

Only and extra accumulator rsf has been added. It maintains a list of resulting values so far. Each time a new computed value is produced is inserted into the rsf accumulator. The result is finally reversed. It is possible to avoid the final reverse if computed values are appended to the end of rsf. Anyway it is clear the idea behind this.

R implementation of Reduce

As we have seen, Reduce in R is a multipurpose function. Other languages as Haskell prefer independent elementary functions for each task (right/left fold and returning accumulated values or not). I find the latter more elegant, easier to read and easier to test. R, though, is fond of functions with lots of parameters, and full of conditionals to select the code to execute for a specific purpose.

Even so, and with the goal of firstly implementing our own version for better understanding the R official version, it is reasonable to begin with basic use cases and add more after those have been implemented.

One of the simplest use case is to execute Reduce for left fold with a given initial value. As usual, we need a very basic set of tests. (more should be added for good coverage). The following are hopefully enough at this stage:

test_that("test reduce: fold-left, init given", {
  expect_equal(my_reduce("+", integer(0), init = 1), 1)
  expect_equal(my_reduce("-", 2:4, init = 1), -8)
  expect_equal(my_reduce("/", c(2, 9, 13), init = 7), 7/234)
  expect_equal(my_reduce(list, 2:4, init = 1), (list(list(list(1, 2), 3), 4)))
})

The most natural implementation at this point is the translation into R of the fold-left tail recursive procedure (first variant), as this:

my_reduce <-
function(f, x, init) {
  f <- match.fun(f)

  iter <- function(acc, x) {
    len <- length(x)
    if (!len) {
      acc
    }
    else {
      first <- x[[1L]]
      rest <- tail(x, -1L)

      iter(f(acc, first), rest)
    }
  }

  iter(init, x)
}

Assuming that test cases are saved into the file test_my_reduce.R, running tests with:

> library(testthat)
> test_file("test_my_reduce.R")

confirms that this implementation works.

A few points about this code. The first line applies match.fun as customary in R code to check the validity of the argument passed as f. I have used 1L instead of just 1 for indexing. This is a common practice in R base code: when an integer is required the number as a literal integer (the number followed by L) is passed. Finally, I have intentionally named the auxiliary helper as iter for reasons that will be clear. Any name would have worked though.

The next step is handling the case where no initial value is passed. Some test cases for this:

test_that("test reduce: fold-left, init missing", {
  expect_equal(my_reduce("+", integer(0)), NULL) 
  expect_equal(my_reduce("-", 1:4), -8)
  expect_equal(my_reduce("/", c(7, 2, 9, 13)), 7/234)
  expect_equal(my_reduce(list, 1:4), (list(list(list(1, 2), 3), 4)))
})

The first test case is for the empty vector. As no initial value is given, we cannot return it as before. By convention we return NULL in this case.

And this is an obvious implementation that passes all tests above [only the first lines are added, the iter function remains the same]:

my_reduce <-
function(f, x, init) {
  f <- match.fun(f)
  mis <- missing(init)
  len <- length(x)

  if (mis && !len) {
    return(NULL)
  }

  if (mis) {
    init <- x[[1L]]
    x <- tail(x, -1L)
  }

  iter <- function(acc, x) {
    len <- length(x)
    if (!len) {
      acc
    }
    else {
      first <- x[[1L]]
      rest <- tail(x, -1L)

      iter(f(acc, first), rest)
    }
  }

  iter(init, x)
}

What about the fold-right operation?

Let's write some tests for fold-right with and without the initial value given:

test_that("test reduce: fold-right, init given", {
  expect_equal(my_reduce("+", integer(0), init = 1, right = TRUE), 1)
  expect_equal(my_reduce("-", 1:3, init = 4, right = TRUE), -2)
  expect_equal(my_reduce("/", c(7, 2, 9), init = 13, right = TRUE), 63/26)
  expect_equal(my_reduce(list, 1:3, init = 4, right = TRUE), 
           list(1, list(2, list(3, 4))))
})

test_that("test reduce: fold-right, init missing", {
  expect_equal(my_reduce("+", integer(0), right = TRUE), NULL)
  expect_equal(my_reduce("-", 1:4, right = TRUE), -2)
  expect_equal(my_reduce("/", c(7, 2, 9, 13), right = TRUE), 63/26)
  expect_equal(my_reduce(list, 1:4, right = TRUE), 
           list(1, list(2, list(3, 4))))
})

As for the implementation it looks like it would be a very different piece of code if we were to translate the natural recursive procedure examined in the post referred above. However, when designing functions that handle several possibilities we should seek after maximizing the amount of code shared by each of them. The best possible scenario for the function at hand would be one in which the tail-recursive pattern employed for fold-left is shared by the fold-right operation. It turns out that we already have a promising candidate, the second variant of fold-left. Remember that it has the same signature as fold-right, and it shouldn't be difficult to adapt it so that it serves as a tail-recursive version for fold-right. A closer examination into the expansions of both procedures gives the answer. Let's recall them once again:

fold-right   (-) 1 '(2 3 4) : 2 - (3 - (4 - 1))
fold-left-2  (-) 1 '(2 3 4) : 4 - (3 - (2 - 1)) 

The only difference lies in the order of elements in the given list. The function we are searching for (a tail-recursive fold-right) should have the same expansion as fold-right:

fold-right-2 (-) ??         : 2 - (3 - (4 - 1))

Which arguments should it receive? Obviously:

fold-right-2 (-) 1 '(4 3 2)

Hence, fold-right-2 is just fold-left-2 with the input list reversed, and, as you surely remember, the only remaining difference between our fold-left, the one employed in R, fold-left-1 and fold-left-2 is the order of arguments passed to f.

All of these points are captured by the following implementation [number of lines included]:

1  my_reduce <-
2  function(f, x, init, right = FALSE) {
3    iter <- function(acc, x) {
4      len <- length(x)
5      if (!len) {
6        acc
7      }
8      else {
9        first <- x[[1L]]
10       rest <- tail(x, -1L)
11
12       if (right) iter(f(first, acc), rest) else iter(f(acc, first), rest)
13     }
14   }
15  
16   f <- match.fun(f)
17   mis <- missing(init)
18   len <- length(x)
19 
20   if (mis && !len) {
21     return(NULL)
22   }
23
24   if (mis) {
25     if (right) {
26       init <- x[[len]]
27       x <- rev(head(x, -1L))
28     }
29     else {
30       init <- x[[1L]]
31       x <- tail(x, -1L)
32     }
33   }
34   else {
35     if (right) {
36       x <- rev(x)
37     }
38   }
39 
40   iter(init, x)
41 }

Changes with respect to the previous implementation are:

  1. The nested function iter has been modified so that the recursive call passes parameters to f in the order suitable to the requested kind of fold [line 12].
  2. The handling of options for the arguments init and right has been appropriately extended [lines 24-38]. Note, in particular, that when right is TRUE we need to reverse the sequence, as explained. The R function rev does the job [lines 27, 36].

This implementation passes our current test set. Nice!

Do you think that this function is getting long? If so you are with me. As I said before I prefer shorter functions with as few parameters as possible and independent helpers. But R base code is plenty of this kind of monolithic functions rich in parameters and conditional branches.

[To be continued]

No hay comentarios:

Publicar un comentario