lunes, 8 de agosto de 2016

Reading R Code. The function Reduce in R (part II)

[ This post is the third one in a series about Reduce. To understand it you also need to read the first and second posts in this series.]

Loops

Our implementation of Reduce directly translates the recursive procedures capturing the fold-left and fold-right computations. I think that this kind of recursive thinking is not only more high-level but also easier to understand, and that was the reason for beginning with them. However, in languages that don't provide mechanisms for optimizing recursion, iteration is always expressed via loops or another non-recursive procedure.

We have to do so in R if we hope to handle sequences of certain length, otherwise we'll get something like this:

> source("my_reduce.R")
> my_reduce("+", 1:100000)
Error: evaluation nested too deeply ...

Let's get started with converting the higher-level recursion used so far into a lower-level kind of loop for our first simple implementation in order to see how this can be easily done for the rest of the function.

That first implementation was for fold-left with a given initial value:

my_reduce <-
function(f, x, init) {
  f <- match.fun(f)

  iter <- function(acc, x) {
    len <- length(x)
    if (!len) {
      acc
    }
    else {
      first <- x[[1L]]
      rest <- tail(x, -1L)

      iter(f(acc, first), rest)
    }
  }

  iter(init, x)
}

Using a for loop instead of recursion we get this self-explanatory version:

my_reduce <-
function(f, x, init) {
  f <- match.fun(f)
  acc <- init

  for (e in x) {
    acc <- f(acc, e)
  }
  acc
}

Accessing elements by its index is another way to get the same thing. The refactored function based on indices would be:

my_reduce <-
function(f, x, init) {
  f <- match.fun(f)
  acc <- init
  len <- length(x)
  ind <- seq_len(len)

  for (i in ind) {
    acc <- f(acc, x[[i]])
  }
  acc
}

seq_len is an R function that takes a number and produces an integer vector from 1 to that number (included). So, seq_len(4) produces the sequence 1, 2, 3, 4. ind in the code above just refers to the numerical indices of x to be used in the loop.

A cosmetic and optional detail. Since the local variable acc takes the value passed as init, and the initial init value won't be used any more, we can stick with init as the name for the accumulated value so far, and remove acc:

my_reduce <-
function(f, x, init) {
  f <- match.fun(f)
  len <- length(x)
  ind <- seq_len(len)

  for (i in ind) {
    init <- f(init, x[[i]])
  }
  init
}

Reduce, at last!

That's actually the R implementation for left folding with init given:

function(f, x, init) {
  len <- length(x)
  f <- match.fun(f)
  ind <- seq_len(len)

  for (i in ind) init <- forceAndCall(2, f, init, x[[i]])
  init
}

The only subtle difference lies in how f is called. Instead of a direct application, it uses the forceAndCall R function. The purpose of this is to avoid delayed evaluation of arguments. That means that we force R to evaluate the first 2 arguments (note the 2 as first argument) of f immediately before executing the f body, rather than leaving R to evaluate them after. This is rather technical at this point and we can go ahead without fully understanding. Just note that this construct is like calling the function but in a better way when that function is to be the value passed to a higher-order function like Reduce. We might go deeper into this in the future.

Once we have transformed our initial version into a loop-based version the first half of Reduce is now easy to understand.

 1 Reduce <-
 2 function(f, x, init, right = FALSE, accumulate = FALSE)
 3 {
 4   mis <- missing(init)
 5   len <- length(x)
 6 
 7   if(len == 0L) return(if(mis) NULL else init)
 8 
 9   f <- match.fun(f)
10 
11 
12   if(!is.vector(x) || is.object(x))
13     x <- as.list(x)
14 
15   ind <- seq_len(len)
16 
17   if(mis) {
18     if(right) {
19       init <- x[[len]]
20       ind <- ind[-len]
21     }
22     else {
23       init <- x[[1L]]
24       ind <- ind[-1L]
25     }
26   }
27 
28   if(!accumulate) {
29     if(right) {
30       for(i in rev(ind))
31         init <- forceAndCall(2, f, x[[i]], init)
33     }
34     else {
35       for(i in ind)
36         init <- forceAndCall(2, f, init, x[[i]])
37     }
38     init
39   }
40   else {
..     # to be explained later
..   }
.. }

Apart from pretty minor differences, it is the same code as in our implementation but using indexing and looping instead of recursion.

Let's see:

  • Line 7 is just the code that handles what happens if the given object is empty, the empty list or an empty vector, typically. It returns NULL if no initial value is given; otherwise, returns the initial value.

  • Lines 12-13 introduce one new thing only present in the official R version. Our version assumes a list as input. In R, a vector is expected, These lines ensure that if the value is not a vector or is an object in general it is converted to a type that the following code can deal with, a list.

  • Lines 17-26 initialize variables init and ind accordingly to the kind of fold (left or right) requested if no initial value were provided. init is set exactly as in our version. Regarding ind, instead of working on the rest of the list directly as in our recursive implementation, indices are initialized appropriately, ind[-1L] produces the next indices to the index of the first element in order to process the rest of list for left folding. On the other hand ind[-len] produces the indices for right folding: those of all elements save the last one (that has been used as init). These indices will be reverted in line 30 to process the remaining list from right to left as explained in the previous post.

  • Finally [lines 29-37], the function f is iteratively called with the proper
    order of parameters for each kind of fold, and the result of folding returned in line 38.

The rest of Reduce is the code handling the case were the caller asks for a sequence of partial results rather than the final reduction. We'll look at it in the next post.

No hay comentarios:

Publicar un comentario