Re: [S] Problem reading large files.

Douglas Bates (bates@stat.wisc.edu)
16 Oct 1998 08:52:07 -0500


>>>>> "Andrey" == andrey <andrey@utstat.toronto.edu> writes:

andrey> Hi! I am running S-PLUS Version 3.4 Release 1 for Silicon
andrey> Graphics Iris, IRIX 5.3. i have a large ascii file, over
andrey> 100,000 lines long, one number on each line: 0.766007
andrey> 0.916291 0.924875 0.924710 0.914394 0.042228 etc etc...
andrey> When I try to read this in: my.vector _ scan("my.file") I
andrey> get the error message: Error in scan....Cannot allocate
andrey> 5242880 bytes: options("object.size") is 5000000: see
andrey> options help file...

andrey> So I did: options(object.size=10000000) and even much larger
andrey> numbers, but s-plus keeps complaining; e.g.:

andrey> Cannot allocate 167772160 bytes: options("object.size") is
andrey> 100000000: see options help file and so on.

andrey> If I use much higher than 10000000, it seems to jam our
andrey> shared UNIX facility.

andrey> 100,000 is not that large, so it should be possible to read
andrey> this in, convert it to a matrix, etc.

andrey> Any suggestions?

One possibility is to create a file in the data.dump format of the
object and read it in through data.restore. Be careful when doing
this. The reading of numeric objects in data.restore is much more
efficient than in scan _but_ there is no error checking. If you make
a trivial mistake in the form of the file you may cause S-PLUS to
crash.

In an S-PLUS session, try
S> ttt <- rnorm( 1000 )
S> data.dump( "ttt" )
[1] "dumpdata"
then look at the file "dumpdata". It is in ASCII. It begins
---- beginning of dumpdata (for my set of random numbers) ----
ttt
numeric
1000
0.10868963287780616
1.049256668512627
-0.52032980986645205
-1.4189525554218338
...
----

You already have the numbers themselves in the required format. You
can prepend lines with the object name, the string "numeric", the
length of the object. Make sure you get the length right! On a Unix
system I would use the "wc" program (word count) on the original to
verify the number of lines in the file.

Then use
S> data.restore("bigObject")
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news