Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #15582

Re: quote removal issues within character class

Path csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From Oğuz <oguzismailuysal@gmail.com>
Newsgroups gnu.bash.bug
Subject Re: quote removal issues within character class
Date Sat, 9 Nov 2019 16:45:31 +0200
Lines 229
Approved bug-bash@gnu.org
Message-ID <mailman.1106.1573310740.13325.bug-bash@gnu.org> (permalink)
References <CAH7i3LrZFvJ1ELJzTdQzF1tTqhi9FDrA7xWWYWrd4RgWJs0Wtg@mail.gmail.com> <16736.1573257142@jinx.noi.kre.to> <CAH7i3Lr68CiVXLR9_HoOgQa7Vd-zyVZ+fck-0K3uQPTNSirU2Q@mail.gmail.com> <14879.1573303743@jinx.noi.kre.to> <CAH7i3Lr9OBsjR4cDmEHGv-9nLhDdQbnJSXGODkqT+EZCFZ57VQ@mail.gmail.com>
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type text/plain; charset="UTF-8"
Content-Transfer-Encoding quoted-printable
X-Trace usenet.stanford.edu 1573310740 25987 209.51.188.17 (9 Nov 2019 14:45:40 GMT)
X-Complaints-To action@cs.stanford.edu
Cc "bug-bash@gnu.org" <bug-bash@gnu.org>
To Robert Elz <kre@munnari.oz.au>
Envelope-to bug-bash@gnu.org
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=O6HajBQLdDaVnn1pxVGd01Fi8xQI/il9YLExyOIn12Q=; b=kB58yOn6bMpfgQtwbzeSnBStiLiBl4+OV2V1bt4y0r1ewP3CYOBbk/pOz3o5VE0aOJ TFEvCdKCjECO4LElzKjS9GuzE79U5Zd035dJ1fQ0VyI1Wy2rOhWh8LafjrWkNGBiNtJM mgV3tEriTkl9NhaQ9ynD3+IFkwRC9qSVNtuVY/NOnXENUIEI/xAjVKF9CyjiRS9N7xai Fv0QaIyoOtfLmdMB5EUkqi9XqNyRHmyZcChKYexBVzXie86YbZsthW7czfxFWERXTkc/ NYbsFk99G+zeFrWMsPPi5X4kxNaBnUqW9aEIQxL97KqGfel3PlonsrsRXcvu7GgHvxrT iy4Q==
X-Google-DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=O6HajBQLdDaVnn1pxVGd01Fi8xQI/il9YLExyOIn12Q=; b=hzAEc1qiSbKbOGEGzZQUwOyh7rrC6RuAnBBj48greqmc11EjaYSDHD1b9xe7fbc9ls LOrY1lsNEVc3IAHkJd94e3/NGLJxtcwut1WTnAwgVOkpon/2KR2DEw5ts71zSFv4gccy JVQzi+C/t80M5rWfuEaFVrO+Bbgf5YhgHE/dHjDr8Pj4pICtsqRQ74TOzs9ef6pBSLpv BtBhRngefupmiNp88YAxSjgUPzvEWto2Se7+IiAF/8+ILUBB8zDLJGg2wRN8jJi/2lIR GywB1+NhiPnsD2wrEdJoalRLrE5jj/ZoLwMNkej3fIgsFJ1a58NQsnJ44/kF6HLdHROH eNYw==
X-Gm-Message-State APjAAAUfa9EzSS/Mj5cv0qv3PgEUsQZZt2tDhASe+/3dVzL/fjNj+zrX jSlNFZxqncbkI8Xd2svFVKrIRLSIxy1pCENLdrPhBCDnZSs=
X-Google-Smtp-Source APXvYqyJS1RAoamTTv78aBrlg2IzOMOkLUTm/e2F664zIRKaDZUfJtoTFHvdHTm+rdoOg0bcy0ZDRkfk8uzQAIH4Y3o=
X-Received by 2002:a0c:f241:: with SMTP id z1mr15364507qvl.53.1573310732222; Sat, 09 Nov 2019 06:45:32 -0800 (PST)
In-Reply-To <14879.1573303743@jinx.noi.kre.to>
X-detected-operating-system by eggs.gnu.org: Genre and OS details not recognized.
X-Received-From 2607:f8b0:4864:20::f33
X-Content-Filtered-By Mailman/MimeDel 2.1.23
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.23
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <https://lists.gnu.org/archive/html/bug-bash>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
X-Mailman-Original-Message-ID <CAH7i3Lr9OBsjR4cDmEHGv-9nLhDdQbnJSXGODkqT+EZCFZ57VQ@mail.gmail.com>
X-Mailman-Original-References <CAH7i3LrZFvJ1ELJzTdQzF1tTqhi9FDrA7xWWYWrd4RgWJs0Wtg@mail.gmail.com> <16736.1573257142@jinx.noi.kre.to> <CAH7i3Lr68CiVXLR9_HoOgQa7Vd-zyVZ+fck-0K3uQPTNSirU2Q@mail.gmail.com> <14879.1573303743@jinx.noi.kre.to>
Xref csiph.com gnu.bash.bug:15582

Show key headers only | View raw


You've already answered it, thank you. I didn't know that [:, [., [= were
special *sequences*, I guess I overlooked that part. Thanks again for
taking time to explain it in detail, I'm grateful


9 Kasım 2019 Cumartesi tarihinde Robert Elz <kre@munnari.oz.au> yazdı:

>     Date:        Sat, 9 Nov 2019 07:35:16 +0300
>     From:        =?UTF-8?B?T8SfdXo=?= <oguzismailuysal@gmail.com>
>     Message-ID:  <
> CAH7i3Lr68CiVXLR9_HoOgQa7Vd-zyVZ+fck-0K3uQPTNSirU2Q@mail.gmail.com>
>
>   | is correct, as "foo" does not contain a ']' which would be required
>   | > to match there (quoting the ':' means there is no character class,
>   | > hence we have instead (the negation of) a char class containing '['
> ':'
>   | > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and
>   | > followed by ']' and anything.   foo does not match. f]oo would.
>   | >
>   |
>   | where exactly is this documented in the standard?
>
> I'm not sure which part exactly you're looking for, but char sets in sh
> are specified to be the same as in REs, except that ! replaces ^ as the
> negation character (that's in XCU 2.13.1).  Char sets (bracket expressions)
> in RE's are documented in XBD 9.3.5 wherein it states
>
>         A bracket expression is either a matching list expression or a
>         non-matching list expression. It consists of one or more
> expressions:
>         ordinary characters, collating elements, collating symbols,
>         equivalence classes, character classes, or range expressions.
>         The <right-square-bracket> (']') shall lose its special meaning and
>         represent itself in a bracket expression if it occurs first in the
> list
>         (after an initial <circumflex> ('^'), if any).
>
>         Otherwise, it shall terminate the bracket expression,
>
> That is, a ']' that occurs anywhere else terminates the bracket expression
> except:
>
>         unless it       appears in a collating symbol (such as "[.].]")
>
> (not relevant in the given example)
>
>         or is the ending <right-square-bracket> for a collating symbol,
>         equivalence class, or character class.
>
> So the ']' that immediately follows the second ':' would not terminate the
> bracket expression if it is the ending ']' for a character class
> (collating symbols and equiv classes not being relevant to the example).
> Of course, that can only happen if there is a character class to end.
>
> There's also
>
>         The special characters '.', '*', '[', and '\\'
>         (<period>, <asterisk>, <left-square-bracket>, and <backslash>,
>         respectively) shall lose their special meaning within a bracket
>         expression.
>
> whereupon if the [": sequence does not start a char class, the '[' there
> is simply a literal char inside the bracket expression.
>
> Similarly if the bracket expression ends at the first ']' (the one
> imediately
> after the second ':') the following ']' is simply a literal character, as
> ']' chars are special only when following a '['.
>
> So, all that's left to determine is whether the [": sequence can be
> considered as beginning a char class.
>
> In a RE it certainly cannot - quote chars (' and ") are not special in
> REs at all, and [": is no different syntatically than [x: which no-one
> would treat as being the introduction to a char class.
>
> This is also, I believe (Chet can confirm, or refute, if he desires) where
> bash gets the interpretation that "lower" (including the quotes) is the
> name of the char class in [:"lower":] except that it cannot be, as char
> class names cannot contain quote characters (which should lead to the
> whole sub-expression not being treated as a char class at all, instead
> bash treats it, I think, as if it were an unknown but valid class name).
>
> But when it comes from sh, quote chars are "different" and instead of
> just being characters, they instead affect the interpretation of the
> characters that are quoted.  See XCU 2.2:
>
>         Quoting is used to remove the special meaning of certain characters
>         or words to the shell.
>
>         Quoting can be used to preserve the literal meaning of the special
>         characters in the next paragrapyh [...]
>
>         and the following may need to be quoted under certain
> circumstances.
>         That is, these characters may be special depending on conditions
>         described elsewhere in this volume of POSIX.1-2017:
>
>                 * ? [ # ~ = %
>
> to which more chars have been added (as I recall) recently by some
> Austin Group correction (which I think includes ! : - and ]), that is
> to make it clear, that in sh
>
>                 [a'-'z]
>
> is a bracket expression containing 3 chars 'a' '-' and 'z' (which form
> of quoting is used to remove the specialness of the '-' is irrelevant).
> and that "[a-z]" isn't a bracket expression at all (neither of which
> is true in an RE - though the role of \ in RE's is being altered slightlty
> so if it had been [a\-z] in a RE things are less clear.)
>
> The effect of this is that in sh, in an expression like
>
>         [![":lower":]]
>
> the first ':' is not "special" and hence cannot form part of the
> magic opening '[:' sequence for a character class.   Hence this
> expression contains no character class, and consequently the
> ':]' chars are simply a ':' in the bracket expression, and then
> the terminating ']' - which leaves the second ']' being just a
> literal character.
>
>
> While here (these following parts are not relevant to your question I
> believe)
> when used in sh
>
>         [[:"lower":]]
>
> should be treated just the same as
>
>         [[:lower:]]
>
> for the same reason that
>
>         ["abc"]
>
> is treated the same as
>
>         [abc]
>
> That is, quoted characters that are not special are no different
> than the same character unquoted.    That's universal in sh, quoting
> removes special meaning (of lots of things) but where there was none
> the quoting changes nothing at all, eg:
>
>         "ls" \-'l'
>
> is exactly the same as
>
>         ls -l
>
> and
>         x="foo" y=''
> is identical to
>         x=foo y=
> (though not all empty quoted strings are irrelevant that way).
>
> There are other issues that are less clear what should happen, if your
> example had been
>
>         [![:"lower:"]]
>
> then we get into very murky water indeed.   XBD 9.3.5 says:
>
>         The character sequences "[.", "[=", and "[:" (<left-square-bracket>
>         followed by a <period>, <equals-sign>, or <colon>) shall be special
>         inside a bracket expression
>
> [aside: not related to my current point, the "shall be special" is what
> enables sh quoting to stop that from happening, since quoting in the shell
> prevents specialness from happening]
>
>         and are used to delimit collating symbols, equivalence class
>         expressions, and character class expressions.
>
> That part (so far) is clear and non-controversial.
>
>         These symbols shall be followed by a valid expression and the
>         matching terminating sequence ".]", "=]", or ":]", as described
>         in the following items.
>
> That's the part that is less clear.   When a valid expression and the
> terminating sequence appear, there is no issue, and all is fine - what
> is less clear is what happens when one of those reqirements is not met.
>
> Some read this as purely a reqirement on the application - what the
> script writer is required to do - and when they don't the implementation
> (sh or RE library, or whatever) is free to interpret things (which means
> the whole pattern) however it likes (often as not being a pattern at all).
>
> Personally I disagree - I believe it is a requirement on the application
> if it desires the relevant sequence to be interpreted as a char class (etc)
> and if the application does not include a valid expression or terminating
> sequence the implementation should be required to treat the opening
> char sequence as if it did not begin a char class (etc) and the [: were
> simply 2 chars contained in the bracket expression (they must be in
> a bracket expression or the issue doesn't arise at all).
>
> Unfortunately (for the world in general, in that more and more of this
> is becoming unspecified, which makes it harder and harder to know what
> any particular sequence of characters will do) it seems like the former
> interpretation is the more likely to be adopted.
>
> If I have not understoood the "this" in your
>
>         where exactly is this documented
>
> please be more precise, and I will try to answer.
>
> kre
>
>

Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread


Thread

Re: quote removal issues within character class Oğuz <oguzismailuysal@gmail.com> - 2019-11-09 16:45 +0200

csiph-web