[S] How to attach a data frame from within a function without copying--SUMMARY

Terry Elrod (Terry.Elrod@UAlberta.ca)
Fri, 8 May 1998 07:48:11 -0600


My thanks to Prof. Brian Ripley [ripley@stats.ox.ac.uk], Don MacQueen [macq@llnl.gov] and Jens Oehlschlaegel [oehl@Psyres-Stuttgart.DE] for their private replies to my query. I provide Prof. Ripley's reply in full below, followed by my original query.

A SURE SOLUTION:
All pointed out that passing the data frame name as a string argument to the function is the surest way to avoid creating an extra copy of the data frame in the function's evaluation frame, although passing it as an unquoted form and using deparse(substitute()) "ought to be ok" too. The essential trick to avoid the possibility of an extra copy of data = df is to replace data with get(dsn) in the call to assign. This way, although a copy of data must be created (or at least referred to) from within assign's evaluation frame, this frame is immediately broken down at the end of the call and no copy (or pointer to) data will be found in the function. See Prof. Ripley's careful explanation below. More than one responder also reminded me that the first version of S+ to incorporate S4 (which won't be S+ 4.5, I believe) ought to do a better job of handling memory usage, in particular by allowing better control over the copying of data sets from within functions.

PERHAPS NO PROBLEM:
Prof. Ripley also points out below that the appearance of data in a frame does not mean that a copy of the data appear there---it may be only a reference. The copying of data sets is hard to determine or control in S+.

A LIBRARY FOR DATA REFERENCES:
Jens Oehlschlaegel [oehl@Psyres-Stuttgart.DE] posted some time ago to this list an announcement of a library he has developed that allows greater control over whether data are copied or referenced. With his permission I repeated his kind invitation to send his library on request. I have not had a chance to use it as yet. He has also included a list of bugs that arose under S+Win 3.3 when it came to using frame/where = 0 from within functions. He notes he has had no opportunity to test his library under S+Win4.

ALTERNATIVE APPROACHES:
Since posting my query, I have turned to evaluation frames as an alternative to attaching data frames. Handiest is using assign with frame=1 or where=0. Assignments to the former die upon return to the command line, while the latter die only with the end of the session. This is S+'s closest equivalent to allowing global variables. I have also tried using move.frame to work with new.frame, without success.

Again, my thanks to those who responded to this query.

Terry Elrod
--------
Terry Elrod; Assoc. Prof. of Marketing; 3-23 Fac. of Business; Univ. of Alberta;
Edmonton AB; Canada T6G 2R6
email: Terry.Elrod@Ualberta.ca; tel: (403) 492-5884; fax: (403) 492-3325
Web page: http://www.ualberta.ca/~telrod/
--------

-----Original Message-----
From: Prof Brian Ripley [SMTP:ripley@stats.ox.ac.uk]
Sent: Sunday, April 12, 1998 4:13 AM
To: Terry Elrod
Subject: Re: [S] How to attach a data frame from within a function without copying

Terry Elrod wrote:
>
> Here's a question for the experts. A good answer would be very useful to many, I think.

I think this is in fact esoteric, and the answer may depend on the version
of S-PLUS in use! In particular, 5.x/6.x will probbaly work differently.

>
> It's often useful to attach a data frame at the command level--accessing its variables becomes more convenient and faster. These same benefits are also obtained from within functions.
>
> I am writing a model-fitting function with a data argument that may name a data frame. I would like to have the function attach the data frame without passing a copy of the data to the function. This is not as easy as it might appear.
>
> Here's a straightforward way to attach and detach a data frame from within a function, but a copy of the data is passed to the function and remains there.
>
> ********************
> >func <-
> function(data)
> {
> dsn <- deparse(substitute(data))
> attach(data, name = dsn, use.names = F)
> on.exit(detach(what = dsn, save = F), add = T)
> sys.frame() # This line is added to show what is stored in func's frame
> }
> > val <- func(datafrm) # example with a small data frame
> > names(val)
> [1] ".Auto.print" "data" "dsn" "data"
> # ... shows two components of sys.frame with name "data". which means the data copied ...
> > dim(val[[2]])
> [1] 481 32
> # yup, the data were copied all right.

I am afraid that is not obvious to me. How do you know they were copied,
as distinct to a pointer being created? And poking around in the
function frame could of itself cause a copy! I think data is being copied,
but possibly only at the return.

I don't see how to avoid the data frame being copied, but I would
expect that copy to be by reference. Whether it is depends on the
version of S-PLUS in use, and I am sure that Svr4 should just make a
reference. S-PLUS 3.0 would not. As for 3.4 or 4.0, probably only Bill
Dunlap knows.

I think this is easier if you pass the name of the data frame,
that is arrange to do the deparse(substitute(data)) in the caller.
However, the following seems to work:

func <- function(data)
{
dsn <- deparse(substitute(data))
attach(get(dsn), name = dsn, use.names = F)
on.exit(detach(what = dsn, save = F), add = T)
print(names(sys.frame()))
print(sys.frame())
print(search())
invisible()
}
func(hills)
[1] ".Auto.print" "dsn" "data"

$.Auto.print:
[1] F

$dsn:
[1] "hills"

$data:
.Argument(hills, data = )

[1] "/home/ripley/.Data"
[2] "hills"
[3] "/opt1/splus3.4/splus/.Functions"
[4] "/opt1/splus3.4/s/.Functions"
[5] "/opt1/splus3.4/s/.Datasets"
[6] "/opt1/splus3.4/stat/.Datasets"
[7] "/opt1/splus3.4/splus/.Datasets"
[8] "/opt1/splus3.4/library/trellis/.Data"
[9] "/opt1/splus3.4/library/MASS/.Data"

with no sign of a copy. Actually, I think you can just use attach.default
and omit use.names = F. Note that this _has_ made a copy to be attached,
and that is necessary. (If the object was already around and found in a
parent frame I think it is still copied in current versions of S.) By
using get I have ensured that the copy is made in a transient frame for
get. There are other ways to do that! It is possible that this may will
fall foul of the scope rules, that is get() will not be able to see `data'
although the caller function could. It will also not work if data is an
expression, but then it does need to be evaluated.

Hope this helps somewhat. I don't see a bullet-proof solution that is
better than removing the copy, but if the data frame is on the working
database, this one may help.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-----Original Message----- From: Terry Elrod [SMTP:Terry.Elrod@UAlberta.ca] Sent: Saturday, April 11, 1998 7:52 PM To: 's-news@wubios.wustl.edu' Subject: [S] How to attach a data frame from within a function without copying the data?

Here's a question for the experts. A good answer would be very useful to many, I think.

It's often useful to attach a data frame at the command level--accessing its variables becomes more convenient and faster. These same benefits are also obtained from within functions.

I am writing a model-fitting function with a data argument that may name a data frame. I would like to have the function attach the data frame without passing a copy of the data to the function. This is not as easy as it might appear.

Here's a straightforward way to attach and detach a data frame from within a function, but a copy of the data is passed to the function and remains there.

******************** >func <- function(data) { dsn <- deparse(substitute(data)) attach(data, name = dsn, use.names = F) on.exit(detach(what = dsn, save = F), add = T) sys.frame() # This line is added to show what is stored in func's frame } > val <- func(datafrm) # example with a small data frame > names(val) [1] ".Auto.print" "data" "dsn" "data" # ... shows two components of sys.frame with name "data". which means the data copied ... > dim(val[[2]]) [1] 481 32 # yup, the data were copied all right. ******************** The problem, of course, is that func() evaluates data in its call to attach(). It is possible to remove the data frame from the function frame by adding the call: remove("data", frame = sys.nframe()) immediately after attaching the frame. While we're at it, it's also possible to attach the data frame only if it is not already attached, and to check for whether the object exists on the search path. The (tested) result is the following:

func <- function(data) { dsn <- deparse(substitute(data)) if(!exists(dsn)) stop(paste(dsn, " not found on search path.", sep = "")) else if(match(dsn, search(), nomatch = 0)) warning(paste(dsn, " already attached", sep = "")) else { attach(data, name = dsn) remove("data", frame = sys.nframe()) on.exit(detach(what = dsn, save = F)) } #. . . # rest of func follows here #. . . }

However a copy of the data frame is still passed to func. One would hope there is a way to attach a data frame from within a function without copying the entire frame to the function in the first place.

My experiments with various combinations of substitute, deparse and as.name have not proven successful. I am certain code that accomplishes what is contained in func without passing the data to func would get a lot of grateful use from readers of this list.

It would of course be most handy if a function, say attach.df(), could contain the code so that all this could be accomplished by simply inserting the line attach.df(data) into the calling function, but this is perhaps too much to hope for. I see no harm in assuming, as I have in func, that the object is attached under its own name.

Any fairy godparents out there willing and able to help with this one?

Terry Elrod

-------- Terry Elrod; Assoc. Prof. of Marketing; 3-23 Fac. of Business; Univ. of Alberta; Edmonton AB; Canada T6G 2R6 email: Terry.Elrod@Ualberta.ca; tel: (403) 492-5884; fax: (403) 492-3325 Web page: http://www.ualberta.ca/~telrod/ --------

----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news

----------------------------------------------------------------------- This message was distributed by s-news@wubios.wustl.edu. To unsubscribe send e-mail to s-news-request@wubios.wustl.edu with the BODY of the message: unsubscribe s-news