POSIX shell: does `$` lose its special meaning if it is the last character in a word?
On ash, dash and bash, when I run
$ echo ab$
it returns
ab$
Is this behavior specified by POSIX or is it just a common convention in POSIX-compliant shells? I couldn't find anything on the POSIX Shell Command Language page that mentions this behavior.
shell posix
add a comment |
On ash, dash and bash, when I run
$ echo ab$
it returns
ab$
Is this behavior specified by POSIX or is it just a common convention in POSIX-compliant shells? I couldn't find anything on the POSIX Shell Command Language page that mentions this behavior.
shell posix
2
The better question is "Does$
gain a special meaning if it is the last character in a word?" There is no single special meaning assigned to$
; it is used to introduce multiple, but distinct, expansions, like parameter expansion${...}
, command substitution$(...)
, and arithmetic expressions$((...))
. Some shells introduce additional contexts, likeksh
's command-substitution variantx=${ echo foo; echo bar;}
(which differs from the standard$(...)
by not executing the commands in a subshell).
– chepner
21 hours ago
add a comment |
On ash, dash and bash, when I run
$ echo ab$
it returns
ab$
Is this behavior specified by POSIX or is it just a common convention in POSIX-compliant shells? I couldn't find anything on the POSIX Shell Command Language page that mentions this behavior.
shell posix
On ash, dash and bash, when I run
$ echo ab$
it returns
ab$
Is this behavior specified by POSIX or is it just a common convention in POSIX-compliant shells? I couldn't find anything on the POSIX Shell Command Language page that mentions this behavior.
shell posix
shell posix
edited yesterday
Sparhawk
9,33163991
9,33163991
asked yesterday
Harold Fischer
628315
628315
2
The better question is "Does$
gain a special meaning if it is the last character in a word?" There is no single special meaning assigned to$
; it is used to introduce multiple, but distinct, expansions, like parameter expansion${...}
, command substitution$(...)
, and arithmetic expressions$((...))
. Some shells introduce additional contexts, likeksh
's command-substitution variantx=${ echo foo; echo bar;}
(which differs from the standard$(...)
by not executing the commands in a subshell).
– chepner
21 hours ago
add a comment |
2
The better question is "Does$
gain a special meaning if it is the last character in a word?" There is no single special meaning assigned to$
; it is used to introduce multiple, but distinct, expansions, like parameter expansion${...}
, command substitution$(...)
, and arithmetic expressions$((...))
. Some shells introduce additional contexts, likeksh
's command-substitution variantx=${ echo foo; echo bar;}
(which differs from the standard$(...)
by not executing the commands in a subshell).
– chepner
21 hours ago
2
2
The better question is "Does
$
gain a special meaning if it is the last character in a word?" There is no single special meaning assigned to $
; it is used to introduce multiple, but distinct, expansions, like parameter expansion ${...}
, command substitution $(...)
, and arithmetic expressions $((...))
. Some shells introduce additional contexts, like ksh
's command-substitution variant x=${ echo foo; echo bar;}
(which differs from the standard $(...)
by not executing the commands in a subshell).– chepner
21 hours ago
The better question is "Does
$
gain a special meaning if it is the last character in a word?" There is no single special meaning assigned to $
; it is used to introduce multiple, but distinct, expansions, like parameter expansion ${...}
, command substitution $(...)
, and arithmetic expressions $((...))
. Some shells introduce additional contexts, like ksh
's command-substitution variant x=${ echo foo; echo bar;}
(which differs from the standard $(...)
by not executing the commands in a subshell).– chepner
21 hours ago
add a comment |
3 Answers
3
active
oldest
votes
A $
followed by an space (or no character (IMO)) is unspecified by POSIX.
The '$' character is used to introduce parameter expansion, command substitution, or arithmetic evaluation. If an unquoted '$' is followed by a character that is not one of the following:
- A numeric character
- The name of one of the special parameters (see Special Parameters)
- A valid first character of a variable name
- A <left-curly-bracket> ( '{' )
- A <left-parenthesis>
the result is unspecified.
To make it explicit, an unquoted $
that is not followed by a character in this regex:
[0-9@*#?$!_a-zA-Z{(-]
is explicitly unspecified: any result is allowed by POSIX.
That is: any specific result is not guaranteed by POSIX.
Or, if used, there is not way to know what would be done by following POSIX.
A quoted $
(either with "
or '
) must be followed by the quoting character, so, a $
could not be the last character of a word. Understand that a word must contain the quotes. From [2.3 Token Recognition][3]
(…) the result token shall contain exactly the characters that appear in the input (…), unmodified, including any embedded or enclosing quotes (…).
The only other option left is a quoted $
with a backslash, which is to be interpreted as the "raw" character losing its special meaning (if any).
Conclusion
So, yes, a trailing could $
lose its special meaning of starting an expansion, either by being backlash quoted or, while unquoted, by being unspecified.
However, all implementations that I know of accept a trailing $
as part of the preceding word (if any) without any error or warning.
In trailing I mean that the following character ends a word (<blank>, |
, &
, ;
, <
, >
, or NUL) or the end of input was signaled.
Let's take a walk through the processing steps of the two basic sequences that interpret the command line. One divides the command line into tokens (and identify them) and is explained in 2.3 Token Recognition
echo a$a
With echo a$abc
, at some point, the $
gets to be processed:
- step 1: As a
$
is not the end of input, keep going. - steps 2 and 3: The previous character
a
was not an operator, keep going. - step 4: A
$
is not a <backslash>, a single-quote, or a double-quote, keep going. - step 5: The current character is an unquoted '$' or '`', the shell shall identify the start of any candidates for parameter expansion.
- step 5: The shell shall read sufficient input to determine the end of the unit to be expanded (as explained in the cited sections).
- Needs to go into the cited sections to decide if the token is valid.
- Section 2.6 Word Expansions: The '$' character is used to introduce an expansion.
- The unquoted '$' is followed by an
a
, it is one of the valid characters. - Keep reading to the following <blank> to find the end of the unit to be expanded
- The
$abc
thus collected is delimited and tagged as an expansion.
echo a$ a
For echo a$ a
all the steps above are the same until step 8, but here:
- The unquoted '$' is followed by a <blank>, it is not one of the valid characters, it is therefore unspecified. Generally, implementations delimit the token that is followed by an space, but do not tag it as an expansion.
echo a$
Again, most of the steps above apply for echo a$
, but 8 change to:
The unquoted '$' is followed by a NUL (the terminating character for a C string), it is therefore not one of the valid characters. It is therefore unspecified AFAICT.
Following comments: If the interpretation of the string: If an unquoted '$' is followed by a character (…) is to claim that no character follows when there is a following NUL, newline, or that the end of the input was reached by some other indicator (EOT, etc.), then, the
echo a$
is not a valid expansion (but doesn't fall into the unspecified case), the token gets delimited but not tagged as an expansion.
Most implementations delimit the token that is followed by an invalid character or by the end of input, but do not tag it as an expansion, anyway.
In summary
echo a$abc # expansion of parameter
abc, is valid and specified.
echo a$#c # expansion of special parameter
#, valid and specified.
echo a$+c # unspecified behavior.
echo a$ c # is unspecified.
echo a$ # unspecified IMO, unclear by an alternate interpretation.
echo $ # Falls into the same issue.
2
If we’re being fiddly, “if an unquoted ‘$’ is followed by a character that is not ...” and “a$
that is not followed by” are not the same thing. It appears that a$
at the end of input does create a literal word by the letter of the specification with no unspecified behaviour invoked.
– Michael Homer
18 hours ago
@MichaelHomer Are you saying thatecho ab$ moreinput
invokes unspecified behavior andecho ab$
does not?
– Harold Fischer
16 hours ago
1
Sorry, what I meant was that a$
followed by nothing at all, as in Harold'secho ab$
, isn't "followed by a character that is not ..." because it isn't followed by a character. I wasn't talking about double-quoting (or quoting at all).
– Michael Homer
14 hours ago
1
I guess not then? Here's where my confusion lies: "$
... followed by a character that is not ..." requires that$
is followed by a character, and that the character in question is not one on the list. If$
is not followed by anything, it certainly is not followed by a character. This is distinct from "$
not followed by a character on this list", which would encompass both$
followed by unlisted characters and$
not followed by any character.
– Michael Homer
14 hours ago
1
That is, "followed by a character that is not one of" and "not followed by a character that is one of" are not equivalent.
– Michael Homer
14 hours ago
|
show 10 more comments
$
does not have a special meaning by itself (try echo $
), only when combined with other character after it and forming an expansion, e.g. $var
(or ${var}
), $(util)
, $((1+2))
.
The $
gets its "special" meaning as defining an expansion in the POSIX standard under the section Token Recognition:
If the current character is an unquoted
$
or`
, the shell shall identify the start of any candidates for parameter expansion, command substitution, or arithmetic expansion from their introductory unquoted character sequences:$
or${
,$(
or`
, and$((
, respectively. The shell shall read sufficient input to determine the end of the unit to be expanded. While processing the characters, if instances of expansions or quoting are found nested within the substitution, the shell shall recursively process them in the manner specified for the construct that is found. The characters found from the beginning of the substitution to its end, allowing for any recursion necessary to recognize embedded constructs, shall be included unmodified in the result token, including any embedded or enclosing substitution operators or quotes. The token shall not be delimited by the end of the substitution.
So, if $
does not form an expansion, other parsing rules come into effect:
If the previous character was part of a word, the current character shall be appended to that word.
That covers your ab$
string.
In the case of a lone $
(the "new word" would be the $
by itself):
The current character is used as the start of a new word.
The meaning of the generated word containing a $
that is not a standard expansion is explicitly defined as unspecified by POSIX.
Also note that $
is the last character in $$
, but that this also happens to be the variable that holds the current shell's PID. In bash
, !$
may invoke a history expansion (the last argument af the previous command). So in general, no, $
is not without meaning at the end of an unquoted word, but at the end of a word it does at least not denote a standard expansion.
@Isaac I deleted the parenthesis that I'm assuming you're referring to.
– Kusalananda
yesterday
@Isaac Ah, I see. I thought "left unspecified" would mean the same as "defined as unspecified", but I could definitely make that wording better.
– Kusalananda
yesterday
It seems reasonable that In the case of a lone $ (the "new word" would be the $ by itself): and that is what has been generally implemented, but what sounds reasonable and what the spec states don't always match. What the spec clearly states is: "not followed by (…)", and a trailing$
character is "not followed by (…)". So, it is explicitly unspecified.
– Isaac
yesterday
@Isaac Yes. This is what I say. The section I'm quoting is on token recognition. The lone$
is recognised as a word token. It is not recognised as an expansion. The meaning of that lone$
word, i.e. the action that the shell takes, is unspecified. This is what I say.
– Kusalananda
yesterday
@Isaac (sigh), yes, and this is what I now have in my answer: "The meaning of the generated word containing a $ that is not a standard expansion is explicitly defined as unspecified by POSIX." What do you suggest that I reword it as?
– Kusalananda
yesterday
add a comment |
Depending on the exact situation, this is either explicitly unspecified (so implementations may do as they will) or required to happen as you observed. In your exact scenario echo ab$
, POSIX mandates the output "ab$" that you observed and it is not unspecified. A quick summary of all the different cases is at the end.
There are two elements: first tokenising into words, and then interpretation of those words.
Tokenisation
POSIX tokenisation requires that a $
that is not the start of a valid parameter expansion, command substitution, or arithmetic substitution to be considered a literal part of the WORD
token being constructed. This is because rule 5 ("If the current character is an unquoted $
or `
, the shell shall identify the start of any candidates for parameter expansion, command substitution, or arithmetic expansion from their introductory unquoted character sequences: $
or ${
, $(
or `
, and $((
, respectively") does not apply, as none of those expansions are viable there. Parameter expansion requires a valid name to appear there, and an empty name is not valid.
Since this rule did not apply, we continue following until we find one that does. The two candidates are #8 ("If the previous character was part of a word, the current character shall be appended to that word.") and #10 ("The current character is used as the start of a new word."), which apply to echo a$
and echo $
respectively.
There is also a third case of the form echo a$+b
which falls through the same crack, since +
is not the name of a special parameter. This one we'll return to later, since it triggers different parts of the rules.
The specification thus requires that the $
be considered a part of the word syntactically, and it can then be further processed later on.
Word expansion
After the input has been parsed in this way, with the $
included in the word, word expansions are applied to each of the words that have been read. Each word is processed individually.
It is specified that:
If an unquoted '$' is followed by a character that is not one of the following:
- A numeric character
- The name of one of the special parameters (see Special Parameters)
- A valid first character of a variable name
- A
<left-curly-bracket>
( '{' )
- A
<left-parenthesis>
the result is unspecified.
"Unspecified" is a particular term here meaning that
- A conforming shell can choose any behaviour in this case
- A conforming application cannot rely on any particular behaviour
In your example, echo ab$
, the $
is not followed by any character, so this rule does not apply and the unspecified result is not invoked. There is simply no expansion incited by the $
, so it is literally present and printed out.
Where it would apply is in our third case from above: echo a$+b
. Here $
is followed by +
, which is not a number, special parameter (@
, *
, #
, ?
, -
, $
, !
, or 0
), start of a variable name (underscore or an alphabetic from the portable character set), or one of the brackets. In this case, the behaviour is unspecified: a conforming shell is permitted to invent a special parameter called +
to expand, and a conforming application should not assume that the shell does not. The shell could do anything else it liked as well, including reporting an error.
For example, zsh, including in its POSIX mode, interprets $+b
as "is variable b
set" and substitutes either 1 or 0 in its place. It similarly has extensions for ~
and =
. This is conforming behaviour.
Another place this could happen is echo "a$ b"
. Again, the shell is permitted to do as it wishes, and you as the script author should escape the $
if you want literal output. If you don't, it may work, but you can't rely on it. This is the absolute letter of the specification, but I don't think this sort of granularity was intended or considered.
In summary
echo ab$
: literal output, fully specified
echo a$ b
: literal output, fully specified
echo a$ b$
: literal output, fully specified
echo a$b
: expansion of parameterb
, fully specified
echo a$-b
: expansion of special parameter-
, fully specified
echo a$+b
: unspecified behaviour
echo "a$ b"
: unspecified behaviour
For a $
at the end of a word, you are permitted to rely on the behaviour and it must be treated literally and passed on to the echo
command as part of its argument. That is a conformance requirement on the shell.
1
Awesome summary at the end
– Harold Fischer
15 hours ago
As an aside, this also meansecho a$ b$
would be fully specified, correct?
– Harold Fischer
15 hours ago
@HaroldFischer Yes, each word could have its own$
.
– Michael Homer
15 hours ago
Wow, thanks. Great teaching. I get the output ofa0
inzsh
5.6.2 forecho a$+b
. I understand it's the invention ofzsh
.
– Christopher
14 hours ago
@Christopher, yes inzsh
$+var
or${+var}
expands to1
if$var
is set and 0 otherwise (see also$#var
from csh to get the number of elements of the array (or length of a variable). Note thatzsh
only tries to be POSIX compliant insh
emulation (when invoked assh
or afteremulate sh
). In sh emulation,$#var
is disabled as$#var
means something different in POSIX/Bourne, but$+var
is not as$+var
is unspecified anyway by POSIX. See also$[var]
unspecified by POSIX, but used as arithmetic expansion by bash/zsh (based on an early POSIX draft IIRC)
– Stéphane Chazelas
14 hours ago
|
show 7 more comments
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492397%2fposix-shell-does-lose-its-special-meaning-if-it-is-the-last-character-in-a%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
A $
followed by an space (or no character (IMO)) is unspecified by POSIX.
The '$' character is used to introduce parameter expansion, command substitution, or arithmetic evaluation. If an unquoted '$' is followed by a character that is not one of the following:
- A numeric character
- The name of one of the special parameters (see Special Parameters)
- A valid first character of a variable name
- A <left-curly-bracket> ( '{' )
- A <left-parenthesis>
the result is unspecified.
To make it explicit, an unquoted $
that is not followed by a character in this regex:
[0-9@*#?$!_a-zA-Z{(-]
is explicitly unspecified: any result is allowed by POSIX.
That is: any specific result is not guaranteed by POSIX.
Or, if used, there is not way to know what would be done by following POSIX.
A quoted $
(either with "
or '
) must be followed by the quoting character, so, a $
could not be the last character of a word. Understand that a word must contain the quotes. From [2.3 Token Recognition][3]
(…) the result token shall contain exactly the characters that appear in the input (…), unmodified, including any embedded or enclosing quotes (…).
The only other option left is a quoted $
with a backslash, which is to be interpreted as the "raw" character losing its special meaning (if any).
Conclusion
So, yes, a trailing could $
lose its special meaning of starting an expansion, either by being backlash quoted or, while unquoted, by being unspecified.
However, all implementations that I know of accept a trailing $
as part of the preceding word (if any) without any error or warning.
In trailing I mean that the following character ends a word (<blank>, |
, &
, ;
, <
, >
, or NUL) or the end of input was signaled.
Let's take a walk through the processing steps of the two basic sequences that interpret the command line. One divides the command line into tokens (and identify them) and is explained in 2.3 Token Recognition
echo a$a
With echo a$abc
, at some point, the $
gets to be processed:
- step 1: As a
$
is not the end of input, keep going. - steps 2 and 3: The previous character
a
was not an operator, keep going. - step 4: A
$
is not a <backslash>, a single-quote, or a double-quote, keep going. - step 5: The current character is an unquoted '$' or '`', the shell shall identify the start of any candidates for parameter expansion.
- step 5: The shell shall read sufficient input to determine the end of the unit to be expanded (as explained in the cited sections).
- Needs to go into the cited sections to decide if the token is valid.
- Section 2.6 Word Expansions: The '$' character is used to introduce an expansion.
- The unquoted '$' is followed by an
a
, it is one of the valid characters. - Keep reading to the following <blank> to find the end of the unit to be expanded
- The
$abc
thus collected is delimited and tagged as an expansion.
echo a$ a
For echo a$ a
all the steps above are the same until step 8, but here:
- The unquoted '$' is followed by a <blank>, it is not one of the valid characters, it is therefore unspecified. Generally, implementations delimit the token that is followed by an space, but do not tag it as an expansion.
echo a$
Again, most of the steps above apply for echo a$
, but 8 change to:
The unquoted '$' is followed by a NUL (the terminating character for a C string), it is therefore not one of the valid characters. It is therefore unspecified AFAICT.
Following comments: If the interpretation of the string: If an unquoted '$' is followed by a character (…) is to claim that no character follows when there is a following NUL, newline, or that the end of the input was reached by some other indicator (EOT, etc.), then, the
echo a$
is not a valid expansion (but doesn't fall into the unspecified case), the token gets delimited but not tagged as an expansion.
Most implementations delimit the token that is followed by an invalid character or by the end of input, but do not tag it as an expansion, anyway.
In summary
echo a$abc # expansion of parameter
abc, is valid and specified.
echo a$#c # expansion of special parameter
#, valid and specified.
echo a$+c # unspecified behavior.
echo a$ c # is unspecified.
echo a$ # unspecified IMO, unclear by an alternate interpretation.
echo $ # Falls into the same issue.
2
If we’re being fiddly, “if an unquoted ‘$’ is followed by a character that is not ...” and “a$
that is not followed by” are not the same thing. It appears that a$
at the end of input does create a literal word by the letter of the specification with no unspecified behaviour invoked.
– Michael Homer
18 hours ago
@MichaelHomer Are you saying thatecho ab$ moreinput
invokes unspecified behavior andecho ab$
does not?
– Harold Fischer
16 hours ago
1
Sorry, what I meant was that a$
followed by nothing at all, as in Harold'secho ab$
, isn't "followed by a character that is not ..." because it isn't followed by a character. I wasn't talking about double-quoting (or quoting at all).
– Michael Homer
14 hours ago
1
I guess not then? Here's where my confusion lies: "$
... followed by a character that is not ..." requires that$
is followed by a character, and that the character in question is not one on the list. If$
is not followed by anything, it certainly is not followed by a character. This is distinct from "$
not followed by a character on this list", which would encompass both$
followed by unlisted characters and$
not followed by any character.
– Michael Homer
14 hours ago
1
That is, "followed by a character that is not one of" and "not followed by a character that is one of" are not equivalent.
– Michael Homer
14 hours ago
|
show 10 more comments
A $
followed by an space (or no character (IMO)) is unspecified by POSIX.
The '$' character is used to introduce parameter expansion, command substitution, or arithmetic evaluation. If an unquoted '$' is followed by a character that is not one of the following:
- A numeric character
- The name of one of the special parameters (see Special Parameters)
- A valid first character of a variable name
- A <left-curly-bracket> ( '{' )
- A <left-parenthesis>
the result is unspecified.
To make it explicit, an unquoted $
that is not followed by a character in this regex:
[0-9@*#?$!_a-zA-Z{(-]
is explicitly unspecified: any result is allowed by POSIX.
That is: any specific result is not guaranteed by POSIX.
Or, if used, there is not way to know what would be done by following POSIX.
A quoted $
(either with "
or '
) must be followed by the quoting character, so, a $
could not be the last character of a word. Understand that a word must contain the quotes. From [2.3 Token Recognition][3]
(…) the result token shall contain exactly the characters that appear in the input (…), unmodified, including any embedded or enclosing quotes (…).
The only other option left is a quoted $
with a backslash, which is to be interpreted as the "raw" character losing its special meaning (if any).
Conclusion
So, yes, a trailing could $
lose its special meaning of starting an expansion, either by being backlash quoted or, while unquoted, by being unspecified.
However, all implementations that I know of accept a trailing $
as part of the preceding word (if any) without any error or warning.
In trailing I mean that the following character ends a word (<blank>, |
, &
, ;
, <
, >
, or NUL) or the end of input was signaled.
Let's take a walk through the processing steps of the two basic sequences that interpret the command line. One divides the command line into tokens (and identify them) and is explained in 2.3 Token Recognition
echo a$a
With echo a$abc
, at some point, the $
gets to be processed:
- step 1: As a
$
is not the end of input, keep going. - steps 2 and 3: The previous character
a
was not an operator, keep going. - step 4: A
$
is not a <backslash>, a single-quote, or a double-quote, keep going. - step 5: The current character is an unquoted '$' or '`', the shell shall identify the start of any candidates for parameter expansion.
- step 5: The shell shall read sufficient input to determine the end of the unit to be expanded (as explained in the cited sections).
- Needs to go into the cited sections to decide if the token is valid.
- Section 2.6 Word Expansions: The '$' character is used to introduce an expansion.
- The unquoted '$' is followed by an
a
, it is one of the valid characters. - Keep reading to the following <blank> to find the end of the unit to be expanded
- The
$abc
thus collected is delimited and tagged as an expansion.
echo a$ a
For echo a$ a
all the steps above are the same until step 8, but here:
- The unquoted '$' is followed by a <blank>, it is not one of the valid characters, it is therefore unspecified. Generally, implementations delimit the token that is followed by an space, but do not tag it as an expansion.
echo a$
Again, most of the steps above apply for echo a$
, but 8 change to:
The unquoted '$' is followed by a NUL (the terminating character for a C string), it is therefore not one of the valid characters. It is therefore unspecified AFAICT.
Following comments: If the interpretation of the string: If an unquoted '$' is followed by a character (…) is to claim that no character follows when there is a following NUL, newline, or that the end of the input was reached by some other indicator (EOT, etc.), then, the
echo a$
is not a valid expansion (but doesn't fall into the unspecified case), the token gets delimited but not tagged as an expansion.
Most implementations delimit the token that is followed by an invalid character or by the end of input, but do not tag it as an expansion, anyway.
In summary
echo a$abc # expansion of parameter
abc, is valid and specified.
echo a$#c # expansion of special parameter
#, valid and specified.
echo a$+c # unspecified behavior.
echo a$ c # is unspecified.
echo a$ # unspecified IMO, unclear by an alternate interpretation.
echo $ # Falls into the same issue.
2
If we’re being fiddly, “if an unquoted ‘$’ is followed by a character that is not ...” and “a$
that is not followed by” are not the same thing. It appears that a$
at the end of input does create a literal word by the letter of the specification with no unspecified behaviour invoked.
– Michael Homer
18 hours ago
@MichaelHomer Are you saying thatecho ab$ moreinput
invokes unspecified behavior andecho ab$
does not?
– Harold Fischer
16 hours ago
1
Sorry, what I meant was that a$
followed by nothing at all, as in Harold'secho ab$
, isn't "followed by a character that is not ..." because it isn't followed by a character. I wasn't talking about double-quoting (or quoting at all).
– Michael Homer
14 hours ago
1
I guess not then? Here's where my confusion lies: "$
... followed by a character that is not ..." requires that$
is followed by a character, and that the character in question is not one on the list. If$
is not followed by anything, it certainly is not followed by a character. This is distinct from "$
not followed by a character on this list", which would encompass both$
followed by unlisted characters and$
not followed by any character.
– Michael Homer
14 hours ago
1
That is, "followed by a character that is not one of" and "not followed by a character that is one of" are not equivalent.
– Michael Homer
14 hours ago
|
show 10 more comments
A $
followed by an space (or no character (IMO)) is unspecified by POSIX.
The '$' character is used to introduce parameter expansion, command substitution, or arithmetic evaluation. If an unquoted '$' is followed by a character that is not one of the following:
- A numeric character
- The name of one of the special parameters (see Special Parameters)
- A valid first character of a variable name
- A <left-curly-bracket> ( '{' )
- A <left-parenthesis>
the result is unspecified.
To make it explicit, an unquoted $
that is not followed by a character in this regex:
[0-9@*#?$!_a-zA-Z{(-]
is explicitly unspecified: any result is allowed by POSIX.
That is: any specific result is not guaranteed by POSIX.
Or, if used, there is not way to know what would be done by following POSIX.
A quoted $
(either with "
or '
) must be followed by the quoting character, so, a $
could not be the last character of a word. Understand that a word must contain the quotes. From [2.3 Token Recognition][3]
(…) the result token shall contain exactly the characters that appear in the input (…), unmodified, including any embedded or enclosing quotes (…).
The only other option left is a quoted $
with a backslash, which is to be interpreted as the "raw" character losing its special meaning (if any).
Conclusion
So, yes, a trailing could $
lose its special meaning of starting an expansion, either by being backlash quoted or, while unquoted, by being unspecified.
However, all implementations that I know of accept a trailing $
as part of the preceding word (if any) without any error or warning.
In trailing I mean that the following character ends a word (<blank>, |
, &
, ;
, <
, >
, or NUL) or the end of input was signaled.
Let's take a walk through the processing steps of the two basic sequences that interpret the command line. One divides the command line into tokens (and identify them) and is explained in 2.3 Token Recognition
echo a$a
With echo a$abc
, at some point, the $
gets to be processed:
- step 1: As a
$
is not the end of input, keep going. - steps 2 and 3: The previous character
a
was not an operator, keep going. - step 4: A
$
is not a <backslash>, a single-quote, or a double-quote, keep going. - step 5: The current character is an unquoted '$' or '`', the shell shall identify the start of any candidates for parameter expansion.
- step 5: The shell shall read sufficient input to determine the end of the unit to be expanded (as explained in the cited sections).
- Needs to go into the cited sections to decide if the token is valid.
- Section 2.6 Word Expansions: The '$' character is used to introduce an expansion.
- The unquoted '$' is followed by an
a
, it is one of the valid characters. - Keep reading to the following <blank> to find the end of the unit to be expanded
- The
$abc
thus collected is delimited and tagged as an expansion.
echo a$ a
For echo a$ a
all the steps above are the same until step 8, but here:
- The unquoted '$' is followed by a <blank>, it is not one of the valid characters, it is therefore unspecified. Generally, implementations delimit the token that is followed by an space, but do not tag it as an expansion.
echo a$
Again, most of the steps above apply for echo a$
, but 8 change to:
The unquoted '$' is followed by a NUL (the terminating character for a C string), it is therefore not one of the valid characters. It is therefore unspecified AFAICT.
Following comments: If the interpretation of the string: If an unquoted '$' is followed by a character (…) is to claim that no character follows when there is a following NUL, newline, or that the end of the input was reached by some other indicator (EOT, etc.), then, the
echo a$
is not a valid expansion (but doesn't fall into the unspecified case), the token gets delimited but not tagged as an expansion.
Most implementations delimit the token that is followed by an invalid character or by the end of input, but do not tag it as an expansion, anyway.
In summary
echo a$abc # expansion of parameter
abc, is valid and specified.
echo a$#c # expansion of special parameter
#, valid and specified.
echo a$+c # unspecified behavior.
echo a$ c # is unspecified.
echo a$ # unspecified IMO, unclear by an alternate interpretation.
echo $ # Falls into the same issue.
A $
followed by an space (or no character (IMO)) is unspecified by POSIX.
The '$' character is used to introduce parameter expansion, command substitution, or arithmetic evaluation. If an unquoted '$' is followed by a character that is not one of the following:
- A numeric character
- The name of one of the special parameters (see Special Parameters)
- A valid first character of a variable name
- A <left-curly-bracket> ( '{' )
- A <left-parenthesis>
the result is unspecified.
To make it explicit, an unquoted $
that is not followed by a character in this regex:
[0-9@*#?$!_a-zA-Z{(-]
is explicitly unspecified: any result is allowed by POSIX.
That is: any specific result is not guaranteed by POSIX.
Or, if used, there is not way to know what would be done by following POSIX.
A quoted $
(either with "
or '
) must be followed by the quoting character, so, a $
could not be the last character of a word. Understand that a word must contain the quotes. From [2.3 Token Recognition][3]
(…) the result token shall contain exactly the characters that appear in the input (…), unmodified, including any embedded or enclosing quotes (…).
The only other option left is a quoted $
with a backslash, which is to be interpreted as the "raw" character losing its special meaning (if any).
Conclusion
So, yes, a trailing could $
lose its special meaning of starting an expansion, either by being backlash quoted or, while unquoted, by being unspecified.
However, all implementations that I know of accept a trailing $
as part of the preceding word (if any) without any error or warning.
In trailing I mean that the following character ends a word (<blank>, |
, &
, ;
, <
, >
, or NUL) or the end of input was signaled.
Let's take a walk through the processing steps of the two basic sequences that interpret the command line. One divides the command line into tokens (and identify them) and is explained in 2.3 Token Recognition
echo a$a
With echo a$abc
, at some point, the $
gets to be processed:
- step 1: As a
$
is not the end of input, keep going. - steps 2 and 3: The previous character
a
was not an operator, keep going. - step 4: A
$
is not a <backslash>, a single-quote, or a double-quote, keep going. - step 5: The current character is an unquoted '$' or '`', the shell shall identify the start of any candidates for parameter expansion.
- step 5: The shell shall read sufficient input to determine the end of the unit to be expanded (as explained in the cited sections).
- Needs to go into the cited sections to decide if the token is valid.
- Section 2.6 Word Expansions: The '$' character is used to introduce an expansion.
- The unquoted '$' is followed by an
a
, it is one of the valid characters. - Keep reading to the following <blank> to find the end of the unit to be expanded
- The
$abc
thus collected is delimited and tagged as an expansion.
echo a$ a
For echo a$ a
all the steps above are the same until step 8, but here:
- The unquoted '$' is followed by a <blank>, it is not one of the valid characters, it is therefore unspecified. Generally, implementations delimit the token that is followed by an space, but do not tag it as an expansion.
echo a$
Again, most of the steps above apply for echo a$
, but 8 change to:
The unquoted '$' is followed by a NUL (the terminating character for a C string), it is therefore not one of the valid characters. It is therefore unspecified AFAICT.
Following comments: If the interpretation of the string: If an unquoted '$' is followed by a character (…) is to claim that no character follows when there is a following NUL, newline, or that the end of the input was reached by some other indicator (EOT, etc.), then, the
echo a$
is not a valid expansion (but doesn't fall into the unspecified case), the token gets delimited but not tagged as an expansion.
Most implementations delimit the token that is followed by an invalid character or by the end of input, but do not tag it as an expansion, anyway.
In summary
echo a$abc # expansion of parameter
abc, is valid and specified.
echo a$#c # expansion of special parameter
#, valid and specified.
echo a$+c # unspecified behavior.
echo a$ c # is unspecified.
echo a$ # unspecified IMO, unclear by an alternate interpretation.
echo $ # Falls into the same issue.
edited 7 hours ago
answered yesterday
Isaac
11.4k11650
11.4k11650
2
If we’re being fiddly, “if an unquoted ‘$’ is followed by a character that is not ...” and “a$
that is not followed by” are not the same thing. It appears that a$
at the end of input does create a literal word by the letter of the specification with no unspecified behaviour invoked.
– Michael Homer
18 hours ago
@MichaelHomer Are you saying thatecho ab$ moreinput
invokes unspecified behavior andecho ab$
does not?
– Harold Fischer
16 hours ago
1
Sorry, what I meant was that a$
followed by nothing at all, as in Harold'secho ab$
, isn't "followed by a character that is not ..." because it isn't followed by a character. I wasn't talking about double-quoting (or quoting at all).
– Michael Homer
14 hours ago
1
I guess not then? Here's where my confusion lies: "$
... followed by a character that is not ..." requires that$
is followed by a character, and that the character in question is not one on the list. If$
is not followed by anything, it certainly is not followed by a character. This is distinct from "$
not followed by a character on this list", which would encompass both$
followed by unlisted characters and$
not followed by any character.
– Michael Homer
14 hours ago
1
That is, "followed by a character that is not one of" and "not followed by a character that is one of" are not equivalent.
– Michael Homer
14 hours ago
|
show 10 more comments
2
If we’re being fiddly, “if an unquoted ‘$’ is followed by a character that is not ...” and “a$
that is not followed by” are not the same thing. It appears that a$
at the end of input does create a literal word by the letter of the specification with no unspecified behaviour invoked.
– Michael Homer
18 hours ago
@MichaelHomer Are you saying thatecho ab$ moreinput
invokes unspecified behavior andecho ab$
does not?
– Harold Fischer
16 hours ago
1
Sorry, what I meant was that a$
followed by nothing at all, as in Harold'secho ab$
, isn't "followed by a character that is not ..." because it isn't followed by a character. I wasn't talking about double-quoting (or quoting at all).
– Michael Homer
14 hours ago
1
I guess not then? Here's where my confusion lies: "$
... followed by a character that is not ..." requires that$
is followed by a character, and that the character in question is not one on the list. If$
is not followed by anything, it certainly is not followed by a character. This is distinct from "$
not followed by a character on this list", which would encompass both$
followed by unlisted characters and$
not followed by any character.
– Michael Homer
14 hours ago
1
That is, "followed by a character that is not one of" and "not followed by a character that is one of" are not equivalent.
– Michael Homer
14 hours ago
2
2
If we’re being fiddly, “if an unquoted ‘$’ is followed by a character that is not ...” and “a
$
that is not followed by” are not the same thing. It appears that a $
at the end of input does create a literal word by the letter of the specification with no unspecified behaviour invoked.– Michael Homer
18 hours ago
If we’re being fiddly, “if an unquoted ‘$’ is followed by a character that is not ...” and “a
$
that is not followed by” are not the same thing. It appears that a $
at the end of input does create a literal word by the letter of the specification with no unspecified behaviour invoked.– Michael Homer
18 hours ago
@MichaelHomer Are you saying that
echo ab$ moreinput
invokes unspecified behavior and echo ab$
does not?– Harold Fischer
16 hours ago
@MichaelHomer Are you saying that
echo ab$ moreinput
invokes unspecified behavior and echo ab$
does not?– Harold Fischer
16 hours ago
1
1
Sorry, what I meant was that a
$
followed by nothing at all, as in Harold's echo ab$
, isn't "followed by a character that is not ..." because it isn't followed by a character. I wasn't talking about double-quoting (or quoting at all).– Michael Homer
14 hours ago
Sorry, what I meant was that a
$
followed by nothing at all, as in Harold's echo ab$
, isn't "followed by a character that is not ..." because it isn't followed by a character. I wasn't talking about double-quoting (or quoting at all).– Michael Homer
14 hours ago
1
1
I guess not then? Here's where my confusion lies: "
$
... followed by a character that is not ..." requires that $
is followed by a character, and that the character in question is not one on the list. If $
is not followed by anything, it certainly is not followed by a character. This is distinct from "$
not followed by a character on this list", which would encompass both $
followed by unlisted characters and $
not followed by any character.– Michael Homer
14 hours ago
I guess not then? Here's where my confusion lies: "
$
... followed by a character that is not ..." requires that $
is followed by a character, and that the character in question is not one on the list. If $
is not followed by anything, it certainly is not followed by a character. This is distinct from "$
not followed by a character on this list", which would encompass both $
followed by unlisted characters and $
not followed by any character.– Michael Homer
14 hours ago
1
1
That is, "followed by a character that is not one of" and "not followed by a character that is one of" are not equivalent.
– Michael Homer
14 hours ago
That is, "followed by a character that is not one of" and "not followed by a character that is one of" are not equivalent.
– Michael Homer
14 hours ago
|
show 10 more comments
$
does not have a special meaning by itself (try echo $
), only when combined with other character after it and forming an expansion, e.g. $var
(or ${var}
), $(util)
, $((1+2))
.
The $
gets its "special" meaning as defining an expansion in the POSIX standard under the section Token Recognition:
If the current character is an unquoted
$
or`
, the shell shall identify the start of any candidates for parameter expansion, command substitution, or arithmetic expansion from their introductory unquoted character sequences:$
or${
,$(
or`
, and$((
, respectively. The shell shall read sufficient input to determine the end of the unit to be expanded. While processing the characters, if instances of expansions or quoting are found nested within the substitution, the shell shall recursively process them in the manner specified for the construct that is found. The characters found from the beginning of the substitution to its end, allowing for any recursion necessary to recognize embedded constructs, shall be included unmodified in the result token, including any embedded or enclosing substitution operators or quotes. The token shall not be delimited by the end of the substitution.
So, if $
does not form an expansion, other parsing rules come into effect:
If the previous character was part of a word, the current character shall be appended to that word.
That covers your ab$
string.
In the case of a lone $
(the "new word" would be the $
by itself):
The current character is used as the start of a new word.
The meaning of the generated word containing a $
that is not a standard expansion is explicitly defined as unspecified by POSIX.
Also note that $
is the last character in $$
, but that this also happens to be the variable that holds the current shell's PID. In bash
, !$
may invoke a history expansion (the last argument af the previous command). So in general, no, $
is not without meaning at the end of an unquoted word, but at the end of a word it does at least not denote a standard expansion.
@Isaac I deleted the parenthesis that I'm assuming you're referring to.
– Kusalananda
yesterday
@Isaac Ah, I see. I thought "left unspecified" would mean the same as "defined as unspecified", but I could definitely make that wording better.
– Kusalananda
yesterday
It seems reasonable that In the case of a lone $ (the "new word" would be the $ by itself): and that is what has been generally implemented, but what sounds reasonable and what the spec states don't always match. What the spec clearly states is: "not followed by (…)", and a trailing$
character is "not followed by (…)". So, it is explicitly unspecified.
– Isaac
yesterday
@Isaac Yes. This is what I say. The section I'm quoting is on token recognition. The lone$
is recognised as a word token. It is not recognised as an expansion. The meaning of that lone$
word, i.e. the action that the shell takes, is unspecified. This is what I say.
– Kusalananda
yesterday
@Isaac (sigh), yes, and this is what I now have in my answer: "The meaning of the generated word containing a $ that is not a standard expansion is explicitly defined as unspecified by POSIX." What do you suggest that I reword it as?
– Kusalananda
yesterday
add a comment |
$
does not have a special meaning by itself (try echo $
), only when combined with other character after it and forming an expansion, e.g. $var
(or ${var}
), $(util)
, $((1+2))
.
The $
gets its "special" meaning as defining an expansion in the POSIX standard under the section Token Recognition:
If the current character is an unquoted
$
or`
, the shell shall identify the start of any candidates for parameter expansion, command substitution, or arithmetic expansion from their introductory unquoted character sequences:$
or${
,$(
or`
, and$((
, respectively. The shell shall read sufficient input to determine the end of the unit to be expanded. While processing the characters, if instances of expansions or quoting are found nested within the substitution, the shell shall recursively process them in the manner specified for the construct that is found. The characters found from the beginning of the substitution to its end, allowing for any recursion necessary to recognize embedded constructs, shall be included unmodified in the result token, including any embedded or enclosing substitution operators or quotes. The token shall not be delimited by the end of the substitution.
So, if $
does not form an expansion, other parsing rules come into effect:
If the previous character was part of a word, the current character shall be appended to that word.
That covers your ab$
string.
In the case of a lone $
(the "new word" would be the $
by itself):
The current character is used as the start of a new word.
The meaning of the generated word containing a $
that is not a standard expansion is explicitly defined as unspecified by POSIX.
Also note that $
is the last character in $$
, but that this also happens to be the variable that holds the current shell's PID. In bash
, !$
may invoke a history expansion (the last argument af the previous command). So in general, no, $
is not without meaning at the end of an unquoted word, but at the end of a word it does at least not denote a standard expansion.
@Isaac I deleted the parenthesis that I'm assuming you're referring to.
– Kusalananda
yesterday
@Isaac Ah, I see. I thought "left unspecified" would mean the same as "defined as unspecified", but I could definitely make that wording better.
– Kusalananda
yesterday
It seems reasonable that In the case of a lone $ (the "new word" would be the $ by itself): and that is what has been generally implemented, but what sounds reasonable and what the spec states don't always match. What the spec clearly states is: "not followed by (…)", and a trailing$
character is "not followed by (…)". So, it is explicitly unspecified.
– Isaac
yesterday
@Isaac Yes. This is what I say. The section I'm quoting is on token recognition. The lone$
is recognised as a word token. It is not recognised as an expansion. The meaning of that lone$
word, i.e. the action that the shell takes, is unspecified. This is what I say.
– Kusalananda
yesterday
@Isaac (sigh), yes, and this is what I now have in my answer: "The meaning of the generated word containing a $ that is not a standard expansion is explicitly defined as unspecified by POSIX." What do you suggest that I reword it as?
– Kusalananda
yesterday
add a comment |
$
does not have a special meaning by itself (try echo $
), only when combined with other character after it and forming an expansion, e.g. $var
(or ${var}
), $(util)
, $((1+2))
.
The $
gets its "special" meaning as defining an expansion in the POSIX standard under the section Token Recognition:
If the current character is an unquoted
$
or`
, the shell shall identify the start of any candidates for parameter expansion, command substitution, or arithmetic expansion from their introductory unquoted character sequences:$
or${
,$(
or`
, and$((
, respectively. The shell shall read sufficient input to determine the end of the unit to be expanded. While processing the characters, if instances of expansions or quoting are found nested within the substitution, the shell shall recursively process them in the manner specified for the construct that is found. The characters found from the beginning of the substitution to its end, allowing for any recursion necessary to recognize embedded constructs, shall be included unmodified in the result token, including any embedded or enclosing substitution operators or quotes. The token shall not be delimited by the end of the substitution.
So, if $
does not form an expansion, other parsing rules come into effect:
If the previous character was part of a word, the current character shall be appended to that word.
That covers your ab$
string.
In the case of a lone $
(the "new word" would be the $
by itself):
The current character is used as the start of a new word.
The meaning of the generated word containing a $
that is not a standard expansion is explicitly defined as unspecified by POSIX.
Also note that $
is the last character in $$
, but that this also happens to be the variable that holds the current shell's PID. In bash
, !$
may invoke a history expansion (the last argument af the previous command). So in general, no, $
is not without meaning at the end of an unquoted word, but at the end of a word it does at least not denote a standard expansion.
$
does not have a special meaning by itself (try echo $
), only when combined with other character after it and forming an expansion, e.g. $var
(or ${var}
), $(util)
, $((1+2))
.
The $
gets its "special" meaning as defining an expansion in the POSIX standard under the section Token Recognition:
If the current character is an unquoted
$
or`
, the shell shall identify the start of any candidates for parameter expansion, command substitution, or arithmetic expansion from their introductory unquoted character sequences:$
or${
,$(
or`
, and$((
, respectively. The shell shall read sufficient input to determine the end of the unit to be expanded. While processing the characters, if instances of expansions or quoting are found nested within the substitution, the shell shall recursively process them in the manner specified for the construct that is found. The characters found from the beginning of the substitution to its end, allowing for any recursion necessary to recognize embedded constructs, shall be included unmodified in the result token, including any embedded or enclosing substitution operators or quotes. The token shall not be delimited by the end of the substitution.
So, if $
does not form an expansion, other parsing rules come into effect:
If the previous character was part of a word, the current character shall be appended to that word.
That covers your ab$
string.
In the case of a lone $
(the "new word" would be the $
by itself):
The current character is used as the start of a new word.
The meaning of the generated word containing a $
that is not a standard expansion is explicitly defined as unspecified by POSIX.
Also note that $
is the last character in $$
, but that this also happens to be the variable that holds the current shell's PID. In bash
, !$
may invoke a history expansion (the last argument af the previous command). So in general, no, $
is not without meaning at the end of an unquoted word, but at the end of a word it does at least not denote a standard expansion.
edited yesterday
answered yesterday
Kusalananda
122k16230375
122k16230375
@Isaac I deleted the parenthesis that I'm assuming you're referring to.
– Kusalananda
yesterday
@Isaac Ah, I see. I thought "left unspecified" would mean the same as "defined as unspecified", but I could definitely make that wording better.
– Kusalananda
yesterday
It seems reasonable that In the case of a lone $ (the "new word" would be the $ by itself): and that is what has been generally implemented, but what sounds reasonable and what the spec states don't always match. What the spec clearly states is: "not followed by (…)", and a trailing$
character is "not followed by (…)". So, it is explicitly unspecified.
– Isaac
yesterday
@Isaac Yes. This is what I say. The section I'm quoting is on token recognition. The lone$
is recognised as a word token. It is not recognised as an expansion. The meaning of that lone$
word, i.e. the action that the shell takes, is unspecified. This is what I say.
– Kusalananda
yesterday
@Isaac (sigh), yes, and this is what I now have in my answer: "The meaning of the generated word containing a $ that is not a standard expansion is explicitly defined as unspecified by POSIX." What do you suggest that I reword it as?
– Kusalananda
yesterday
add a comment |
@Isaac I deleted the parenthesis that I'm assuming you're referring to.
– Kusalananda
yesterday
@Isaac Ah, I see. I thought "left unspecified" would mean the same as "defined as unspecified", but I could definitely make that wording better.
– Kusalananda
yesterday
It seems reasonable that In the case of a lone $ (the "new word" would be the $ by itself): and that is what has been generally implemented, but what sounds reasonable and what the spec states don't always match. What the spec clearly states is: "not followed by (…)", and a trailing$
character is "not followed by (…)". So, it is explicitly unspecified.
– Isaac
yesterday
@Isaac Yes. This is what I say. The section I'm quoting is on token recognition. The lone$
is recognised as a word token. It is not recognised as an expansion. The meaning of that lone$
word, i.e. the action that the shell takes, is unspecified. This is what I say.
– Kusalananda
yesterday
@Isaac (sigh), yes, and this is what I now have in my answer: "The meaning of the generated word containing a $ that is not a standard expansion is explicitly defined as unspecified by POSIX." What do you suggest that I reword it as?
– Kusalananda
yesterday
@Isaac I deleted the parenthesis that I'm assuming you're referring to.
– Kusalananda
yesterday
@Isaac I deleted the parenthesis that I'm assuming you're referring to.
– Kusalananda
yesterday
@Isaac Ah, I see. I thought "left unspecified" would mean the same as "defined as unspecified", but I could definitely make that wording better.
– Kusalananda
yesterday
@Isaac Ah, I see. I thought "left unspecified" would mean the same as "defined as unspecified", but I could definitely make that wording better.
– Kusalananda
yesterday
It seems reasonable that In the case of a lone $ (the "new word" would be the $ by itself): and that is what has been generally implemented, but what sounds reasonable and what the spec states don't always match. What the spec clearly states is: "not followed by (…)", and a trailing
$
character is "not followed by (…)". So, it is explicitly unspecified.– Isaac
yesterday
It seems reasonable that In the case of a lone $ (the "new word" would be the $ by itself): and that is what has been generally implemented, but what sounds reasonable and what the spec states don't always match. What the spec clearly states is: "not followed by (…)", and a trailing
$
character is "not followed by (…)". So, it is explicitly unspecified.– Isaac
yesterday
@Isaac Yes. This is what I say. The section I'm quoting is on token recognition. The lone
$
is recognised as a word token. It is not recognised as an expansion. The meaning of that lone $
word, i.e. the action that the shell takes, is unspecified. This is what I say.– Kusalananda
yesterday
@Isaac Yes. This is what I say. The section I'm quoting is on token recognition. The lone
$
is recognised as a word token. It is not recognised as an expansion. The meaning of that lone $
word, i.e. the action that the shell takes, is unspecified. This is what I say.– Kusalananda
yesterday
@Isaac (sigh), yes, and this is what I now have in my answer: "The meaning of the generated word containing a $ that is not a standard expansion is explicitly defined as unspecified by POSIX." What do you suggest that I reword it as?
– Kusalananda
yesterday
@Isaac (sigh), yes, and this is what I now have in my answer: "The meaning of the generated word containing a $ that is not a standard expansion is explicitly defined as unspecified by POSIX." What do you suggest that I reword it as?
– Kusalananda
yesterday
add a comment |
Depending on the exact situation, this is either explicitly unspecified (so implementations may do as they will) or required to happen as you observed. In your exact scenario echo ab$
, POSIX mandates the output "ab$" that you observed and it is not unspecified. A quick summary of all the different cases is at the end.
There are two elements: first tokenising into words, and then interpretation of those words.
Tokenisation
POSIX tokenisation requires that a $
that is not the start of a valid parameter expansion, command substitution, or arithmetic substitution to be considered a literal part of the WORD
token being constructed. This is because rule 5 ("If the current character is an unquoted $
or `
, the shell shall identify the start of any candidates for parameter expansion, command substitution, or arithmetic expansion from their introductory unquoted character sequences: $
or ${
, $(
or `
, and $((
, respectively") does not apply, as none of those expansions are viable there. Parameter expansion requires a valid name to appear there, and an empty name is not valid.
Since this rule did not apply, we continue following until we find one that does. The two candidates are #8 ("If the previous character was part of a word, the current character shall be appended to that word.") and #10 ("The current character is used as the start of a new word."), which apply to echo a$
and echo $
respectively.
There is also a third case of the form echo a$+b
which falls through the same crack, since +
is not the name of a special parameter. This one we'll return to later, since it triggers different parts of the rules.
The specification thus requires that the $
be considered a part of the word syntactically, and it can then be further processed later on.
Word expansion
After the input has been parsed in this way, with the $
included in the word, word expansions are applied to each of the words that have been read. Each word is processed individually.
It is specified that:
If an unquoted '$' is followed by a character that is not one of the following:
- A numeric character
- The name of one of the special parameters (see Special Parameters)
- A valid first character of a variable name
- A
<left-curly-bracket>
( '{' )
- A
<left-parenthesis>
the result is unspecified.
"Unspecified" is a particular term here meaning that
- A conforming shell can choose any behaviour in this case
- A conforming application cannot rely on any particular behaviour
In your example, echo ab$
, the $
is not followed by any character, so this rule does not apply and the unspecified result is not invoked. There is simply no expansion incited by the $
, so it is literally present and printed out.
Where it would apply is in our third case from above: echo a$+b
. Here $
is followed by +
, which is not a number, special parameter (@
, *
, #
, ?
, -
, $
, !
, or 0
), start of a variable name (underscore or an alphabetic from the portable character set), or one of the brackets. In this case, the behaviour is unspecified: a conforming shell is permitted to invent a special parameter called +
to expand, and a conforming application should not assume that the shell does not. The shell could do anything else it liked as well, including reporting an error.
For example, zsh, including in its POSIX mode, interprets $+b
as "is variable b
set" and substitutes either 1 or 0 in its place. It similarly has extensions for ~
and =
. This is conforming behaviour.
Another place this could happen is echo "a$ b"
. Again, the shell is permitted to do as it wishes, and you as the script author should escape the $
if you want literal output. If you don't, it may work, but you can't rely on it. This is the absolute letter of the specification, but I don't think this sort of granularity was intended or considered.
In summary
echo ab$
: literal output, fully specified
echo a$ b
: literal output, fully specified
echo a$ b$
: literal output, fully specified
echo a$b
: expansion of parameterb
, fully specified
echo a$-b
: expansion of special parameter-
, fully specified
echo a$+b
: unspecified behaviour
echo "a$ b"
: unspecified behaviour
For a $
at the end of a word, you are permitted to rely on the behaviour and it must be treated literally and passed on to the echo
command as part of its argument. That is a conformance requirement on the shell.
1
Awesome summary at the end
– Harold Fischer
15 hours ago
As an aside, this also meansecho a$ b$
would be fully specified, correct?
– Harold Fischer
15 hours ago
@HaroldFischer Yes, each word could have its own$
.
– Michael Homer
15 hours ago
Wow, thanks. Great teaching. I get the output ofa0
inzsh
5.6.2 forecho a$+b
. I understand it's the invention ofzsh
.
– Christopher
14 hours ago
@Christopher, yes inzsh
$+var
or${+var}
expands to1
if$var
is set and 0 otherwise (see also$#var
from csh to get the number of elements of the array (or length of a variable). Note thatzsh
only tries to be POSIX compliant insh
emulation (when invoked assh
or afteremulate sh
). In sh emulation,$#var
is disabled as$#var
means something different in POSIX/Bourne, but$+var
is not as$+var
is unspecified anyway by POSIX. See also$[var]
unspecified by POSIX, but used as arithmetic expansion by bash/zsh (based on an early POSIX draft IIRC)
– Stéphane Chazelas
14 hours ago
|
show 7 more comments
Depending on the exact situation, this is either explicitly unspecified (so implementations may do as they will) or required to happen as you observed. In your exact scenario echo ab$
, POSIX mandates the output "ab$" that you observed and it is not unspecified. A quick summary of all the different cases is at the end.
There are two elements: first tokenising into words, and then interpretation of those words.
Tokenisation
POSIX tokenisation requires that a $
that is not the start of a valid parameter expansion, command substitution, or arithmetic substitution to be considered a literal part of the WORD
token being constructed. This is because rule 5 ("If the current character is an unquoted $
or `
, the shell shall identify the start of any candidates for parameter expansion, command substitution, or arithmetic expansion from their introductory unquoted character sequences: $
or ${
, $(
or `
, and $((
, respectively") does not apply, as none of those expansions are viable there. Parameter expansion requires a valid name to appear there, and an empty name is not valid.
Since this rule did not apply, we continue following until we find one that does. The two candidates are #8 ("If the previous character was part of a word, the current character shall be appended to that word.") and #10 ("The current character is used as the start of a new word."), which apply to echo a$
and echo $
respectively.
There is also a third case of the form echo a$+b
which falls through the same crack, since +
is not the name of a special parameter. This one we'll return to later, since it triggers different parts of the rules.
The specification thus requires that the $
be considered a part of the word syntactically, and it can then be further processed later on.
Word expansion
After the input has been parsed in this way, with the $
included in the word, word expansions are applied to each of the words that have been read. Each word is processed individually.
It is specified that:
If an unquoted '$' is followed by a character that is not one of the following:
- A numeric character
- The name of one of the special parameters (see Special Parameters)
- A valid first character of a variable name
- A
<left-curly-bracket>
( '{' )
- A
<left-parenthesis>
the result is unspecified.
"Unspecified" is a particular term here meaning that
- A conforming shell can choose any behaviour in this case
- A conforming application cannot rely on any particular behaviour
In your example, echo ab$
, the $
is not followed by any character, so this rule does not apply and the unspecified result is not invoked. There is simply no expansion incited by the $
, so it is literally present and printed out.
Where it would apply is in our third case from above: echo a$+b
. Here $
is followed by +
, which is not a number, special parameter (@
, *
, #
, ?
, -
, $
, !
, or 0
), start of a variable name (underscore or an alphabetic from the portable character set), or one of the brackets. In this case, the behaviour is unspecified: a conforming shell is permitted to invent a special parameter called +
to expand, and a conforming application should not assume that the shell does not. The shell could do anything else it liked as well, including reporting an error.
For example, zsh, including in its POSIX mode, interprets $+b
as "is variable b
set" and substitutes either 1 or 0 in its place. It similarly has extensions for ~
and =
. This is conforming behaviour.
Another place this could happen is echo "a$ b"
. Again, the shell is permitted to do as it wishes, and you as the script author should escape the $
if you want literal output. If you don't, it may work, but you can't rely on it. This is the absolute letter of the specification, but I don't think this sort of granularity was intended or considered.
In summary
echo ab$
: literal output, fully specified
echo a$ b
: literal output, fully specified
echo a$ b$
: literal output, fully specified
echo a$b
: expansion of parameterb
, fully specified
echo a$-b
: expansion of special parameter-
, fully specified
echo a$+b
: unspecified behaviour
echo "a$ b"
: unspecified behaviour
For a $
at the end of a word, you are permitted to rely on the behaviour and it must be treated literally and passed on to the echo
command as part of its argument. That is a conformance requirement on the shell.
1
Awesome summary at the end
– Harold Fischer
15 hours ago
As an aside, this also meansecho a$ b$
would be fully specified, correct?
– Harold Fischer
15 hours ago
@HaroldFischer Yes, each word could have its own$
.
– Michael Homer
15 hours ago
Wow, thanks. Great teaching. I get the output ofa0
inzsh
5.6.2 forecho a$+b
. I understand it's the invention ofzsh
.
– Christopher
14 hours ago
@Christopher, yes inzsh
$+var
or${+var}
expands to1
if$var
is set and 0 otherwise (see also$#var
from csh to get the number of elements of the array (or length of a variable). Note thatzsh
only tries to be POSIX compliant insh
emulation (when invoked assh
or afteremulate sh
). In sh emulation,$#var
is disabled as$#var
means something different in POSIX/Bourne, but$+var
is not as$+var
is unspecified anyway by POSIX. See also$[var]
unspecified by POSIX, but used as arithmetic expansion by bash/zsh (based on an early POSIX draft IIRC)
– Stéphane Chazelas
14 hours ago
|
show 7 more comments
Depending on the exact situation, this is either explicitly unspecified (so implementations may do as they will) or required to happen as you observed. In your exact scenario echo ab$
, POSIX mandates the output "ab$" that you observed and it is not unspecified. A quick summary of all the different cases is at the end.
There are two elements: first tokenising into words, and then interpretation of those words.
Tokenisation
POSIX tokenisation requires that a $
that is not the start of a valid parameter expansion, command substitution, or arithmetic substitution to be considered a literal part of the WORD
token being constructed. This is because rule 5 ("If the current character is an unquoted $
or `
, the shell shall identify the start of any candidates for parameter expansion, command substitution, or arithmetic expansion from their introductory unquoted character sequences: $
or ${
, $(
or `
, and $((
, respectively") does not apply, as none of those expansions are viable there. Parameter expansion requires a valid name to appear there, and an empty name is not valid.
Since this rule did not apply, we continue following until we find one that does. The two candidates are #8 ("If the previous character was part of a word, the current character shall be appended to that word.") and #10 ("The current character is used as the start of a new word."), which apply to echo a$
and echo $
respectively.
There is also a third case of the form echo a$+b
which falls through the same crack, since +
is not the name of a special parameter. This one we'll return to later, since it triggers different parts of the rules.
The specification thus requires that the $
be considered a part of the word syntactically, and it can then be further processed later on.
Word expansion
After the input has been parsed in this way, with the $
included in the word, word expansions are applied to each of the words that have been read. Each word is processed individually.
It is specified that:
If an unquoted '$' is followed by a character that is not one of the following:
- A numeric character
- The name of one of the special parameters (see Special Parameters)
- A valid first character of a variable name
- A
<left-curly-bracket>
( '{' )
- A
<left-parenthesis>
the result is unspecified.
"Unspecified" is a particular term here meaning that
- A conforming shell can choose any behaviour in this case
- A conforming application cannot rely on any particular behaviour
In your example, echo ab$
, the $
is not followed by any character, so this rule does not apply and the unspecified result is not invoked. There is simply no expansion incited by the $
, so it is literally present and printed out.
Where it would apply is in our third case from above: echo a$+b
. Here $
is followed by +
, which is not a number, special parameter (@
, *
, #
, ?
, -
, $
, !
, or 0
), start of a variable name (underscore or an alphabetic from the portable character set), or one of the brackets. In this case, the behaviour is unspecified: a conforming shell is permitted to invent a special parameter called +
to expand, and a conforming application should not assume that the shell does not. The shell could do anything else it liked as well, including reporting an error.
For example, zsh, including in its POSIX mode, interprets $+b
as "is variable b
set" and substitutes either 1 or 0 in its place. It similarly has extensions for ~
and =
. This is conforming behaviour.
Another place this could happen is echo "a$ b"
. Again, the shell is permitted to do as it wishes, and you as the script author should escape the $
if you want literal output. If you don't, it may work, but you can't rely on it. This is the absolute letter of the specification, but I don't think this sort of granularity was intended or considered.
In summary
echo ab$
: literal output, fully specified
echo a$ b
: literal output, fully specified
echo a$ b$
: literal output, fully specified
echo a$b
: expansion of parameterb
, fully specified
echo a$-b
: expansion of special parameter-
, fully specified
echo a$+b
: unspecified behaviour
echo "a$ b"
: unspecified behaviour
For a $
at the end of a word, you are permitted to rely on the behaviour and it must be treated literally and passed on to the echo
command as part of its argument. That is a conformance requirement on the shell.
Depending on the exact situation, this is either explicitly unspecified (so implementations may do as they will) or required to happen as you observed. In your exact scenario echo ab$
, POSIX mandates the output "ab$" that you observed and it is not unspecified. A quick summary of all the different cases is at the end.
There are two elements: first tokenising into words, and then interpretation of those words.
Tokenisation
POSIX tokenisation requires that a $
that is not the start of a valid parameter expansion, command substitution, or arithmetic substitution to be considered a literal part of the WORD
token being constructed. This is because rule 5 ("If the current character is an unquoted $
or `
, the shell shall identify the start of any candidates for parameter expansion, command substitution, or arithmetic expansion from their introductory unquoted character sequences: $
or ${
, $(
or `
, and $((
, respectively") does not apply, as none of those expansions are viable there. Parameter expansion requires a valid name to appear there, and an empty name is not valid.
Since this rule did not apply, we continue following until we find one that does. The two candidates are #8 ("If the previous character was part of a word, the current character shall be appended to that word.") and #10 ("The current character is used as the start of a new word."), which apply to echo a$
and echo $
respectively.
There is also a third case of the form echo a$+b
which falls through the same crack, since +
is not the name of a special parameter. This one we'll return to later, since it triggers different parts of the rules.
The specification thus requires that the $
be considered a part of the word syntactically, and it can then be further processed later on.
Word expansion
After the input has been parsed in this way, with the $
included in the word, word expansions are applied to each of the words that have been read. Each word is processed individually.
It is specified that:
If an unquoted '$' is followed by a character that is not one of the following:
- A numeric character
- The name of one of the special parameters (see Special Parameters)
- A valid first character of a variable name
- A
<left-curly-bracket>
( '{' )
- A
<left-parenthesis>
the result is unspecified.
"Unspecified" is a particular term here meaning that
- A conforming shell can choose any behaviour in this case
- A conforming application cannot rely on any particular behaviour
In your example, echo ab$
, the $
is not followed by any character, so this rule does not apply and the unspecified result is not invoked. There is simply no expansion incited by the $
, so it is literally present and printed out.
Where it would apply is in our third case from above: echo a$+b
. Here $
is followed by +
, which is not a number, special parameter (@
, *
, #
, ?
, -
, $
, !
, or 0
), start of a variable name (underscore or an alphabetic from the portable character set), or one of the brackets. In this case, the behaviour is unspecified: a conforming shell is permitted to invent a special parameter called +
to expand, and a conforming application should not assume that the shell does not. The shell could do anything else it liked as well, including reporting an error.
For example, zsh, including in its POSIX mode, interprets $+b
as "is variable b
set" and substitutes either 1 or 0 in its place. It similarly has extensions for ~
and =
. This is conforming behaviour.
Another place this could happen is echo "a$ b"
. Again, the shell is permitted to do as it wishes, and you as the script author should escape the $
if you want literal output. If you don't, it may work, but you can't rely on it. This is the absolute letter of the specification, but I don't think this sort of granularity was intended or considered.
In summary
echo ab$
: literal output, fully specified
echo a$ b
: literal output, fully specified
echo a$ b$
: literal output, fully specified
echo a$b
: expansion of parameterb
, fully specified
echo a$-b
: expansion of special parameter-
, fully specified
echo a$+b
: unspecified behaviour
echo "a$ b"
: unspecified behaviour
For a $
at the end of a word, you are permitted to rely on the behaviour and it must be treated literally and passed on to the echo
command as part of its argument. That is a conformance requirement on the shell.
edited 14 hours ago
answered 15 hours ago
Michael Homer
46.1k8121160
46.1k8121160
1
Awesome summary at the end
– Harold Fischer
15 hours ago
As an aside, this also meansecho a$ b$
would be fully specified, correct?
– Harold Fischer
15 hours ago
@HaroldFischer Yes, each word could have its own$
.
– Michael Homer
15 hours ago
Wow, thanks. Great teaching. I get the output ofa0
inzsh
5.6.2 forecho a$+b
. I understand it's the invention ofzsh
.
– Christopher
14 hours ago
@Christopher, yes inzsh
$+var
or${+var}
expands to1
if$var
is set and 0 otherwise (see also$#var
from csh to get the number of elements of the array (or length of a variable). Note thatzsh
only tries to be POSIX compliant insh
emulation (when invoked assh
or afteremulate sh
). In sh emulation,$#var
is disabled as$#var
means something different in POSIX/Bourne, but$+var
is not as$+var
is unspecified anyway by POSIX. See also$[var]
unspecified by POSIX, but used as arithmetic expansion by bash/zsh (based on an early POSIX draft IIRC)
– Stéphane Chazelas
14 hours ago
|
show 7 more comments
1
Awesome summary at the end
– Harold Fischer
15 hours ago
As an aside, this also meansecho a$ b$
would be fully specified, correct?
– Harold Fischer
15 hours ago
@HaroldFischer Yes, each word could have its own$
.
– Michael Homer
15 hours ago
Wow, thanks. Great teaching. I get the output ofa0
inzsh
5.6.2 forecho a$+b
. I understand it's the invention ofzsh
.
– Christopher
14 hours ago
@Christopher, yes inzsh
$+var
or${+var}
expands to1
if$var
is set and 0 otherwise (see also$#var
from csh to get the number of elements of the array (or length of a variable). Note thatzsh
only tries to be POSIX compliant insh
emulation (when invoked assh
or afteremulate sh
). In sh emulation,$#var
is disabled as$#var
means something different in POSIX/Bourne, but$+var
is not as$+var
is unspecified anyway by POSIX. See also$[var]
unspecified by POSIX, but used as arithmetic expansion by bash/zsh (based on an early POSIX draft IIRC)
– Stéphane Chazelas
14 hours ago
1
1
Awesome summary at the end
– Harold Fischer
15 hours ago
Awesome summary at the end
– Harold Fischer
15 hours ago
As an aside, this also means
echo a$ b$
would be fully specified, correct?– Harold Fischer
15 hours ago
As an aside, this also means
echo a$ b$
would be fully specified, correct?– Harold Fischer
15 hours ago
@HaroldFischer Yes, each word could have its own
$
.– Michael Homer
15 hours ago
@HaroldFischer Yes, each word could have its own
$
.– Michael Homer
15 hours ago
Wow, thanks. Great teaching. I get the output of
a0
in zsh
5.6.2 for echo a$+b
. I understand it's the invention of zsh
.– Christopher
14 hours ago
Wow, thanks. Great teaching. I get the output of
a0
in zsh
5.6.2 for echo a$+b
. I understand it's the invention of zsh
.– Christopher
14 hours ago
@Christopher, yes in
zsh
$+var
or ${+var}
expands to 1
if $var
is set and 0 otherwise (see also $#var
from csh to get the number of elements of the array (or length of a variable). Note that zsh
only tries to be POSIX compliant in sh
emulation (when invoked as sh
or after emulate sh
). In sh emulation, $#var
is disabled as $#var
means something different in POSIX/Bourne, but $+var
is not as $+var
is unspecified anyway by POSIX. See also $[var]
unspecified by POSIX, but used as arithmetic expansion by bash/zsh (based on an early POSIX draft IIRC)– Stéphane Chazelas
14 hours ago
@Christopher, yes in
zsh
$+var
or ${+var}
expands to 1
if $var
is set and 0 otherwise (see also $#var
from csh to get the number of elements of the array (or length of a variable). Note that zsh
only tries to be POSIX compliant in sh
emulation (when invoked as sh
or after emulate sh
). In sh emulation, $#var
is disabled as $#var
means something different in POSIX/Bourne, but $+var
is not as $+var
is unspecified anyway by POSIX. See also $[var]
unspecified by POSIX, but used as arithmetic expansion by bash/zsh (based on an early POSIX draft IIRC)– Stéphane Chazelas
14 hours ago
|
show 7 more comments
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492397%2fposix-shell-does-lose-its-special-meaning-if-it-is-the-last-character-in-a%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
The better question is "Does
$
gain a special meaning if it is the last character in a word?" There is no single special meaning assigned to$
; it is used to introduce multiple, but distinct, expansions, like parameter expansion${...}
, command substitution$(...)
, and arithmetic expressions$((...))
. Some shells introduce additional contexts, likeksh
's command-substitution variantx=${ echo foo; echo bar;}
(which differs from the standard$(...)
by not executing the commands in a subshell).– chepner
21 hours ago