Re: objects(pattern=...) and parsing of character strings

Alan Zaslavsky (zaslavsk@hcp.med.harvard.edu)
Mon, 19 Jan 1998 08:36:01 -0500 (EST)


> From: Brett Presnell <presnell@maths.anu.edu.au>
.
.
.
> This in turn is caused because paste() is eating the "\" in "\.":
>
> > paste("(","^S\.",")",sep="",collapse="|")
> [1] "(^S.)"
>
> Actually, I suppose the S parser is eating the "\" before it ever gets
> to paste(). Something about this seems like a bug to me, but I'm not
> always sure what I should expect from character operations in S. Note
> that S eats the "\" in "\.","\*","\^","\$","\(","\)","\[", and "\]",
> but not in "\\". Maybe if I just remember that at all times I'll be
> right.
.
.
.

What is going on here is not exactly what it appears, because of the
conventions that S uses in printing character strings. In parsing a
string, the backslash ("\") is an escape character The escape is used,
in particular, to indicate an octal representation of a character or to
indicate some other special characters such as newline="\n", etc. The
backslash appearing before an ordinary character other letters
"escapes" the letter but when the letter has no special meaning, the
backslash has no effect. Therefore to include the backslash in a
string, it must itself be escaped. See following examples:

> "\a" ## backslash escape has no effect
[1] "a"
> "\153" ## backslash for octal representation (happens to be a letter)
[1] "k"
> "\." ## no effect
[1] "."
> "\\" ## backslash escape for a backslash
[1] "\\"
> "\n" ## backslash combination for newline
[1] "\n"

The last two seem odd, but only because the S print() function follows
the same rules as the input parser, and puts the escape backslash back
in front of the backslash! We can easily verify that this is really a
single character string:

> nchar("\\")
[1] 1
> nchar("\n")
[1] 1

and outputting such strings other than with something using "print" (in
particular, with cat()) gives the correct internal string:

> print("abcd\\efg\nhi\tjk\n")
[1] "abcd\\efg\nhi\tjk\n"
> cat("abcd\\efg\nhi\tjk\n")
abcd\efg
hi jk

In your case you want a backslash before the . in the pattern to force
grep to match the . character rather than . as a wildcard for any character.
To represent that \ in the command, you need to escape it, hence
"SP\\."