Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: =?UTF-8?B?T8SfdXo=?= Newsgroups: gnu.bash.bug Subject: Re: quote removal issues within character class Date: Sat, 9 Nov 2019 16:45:31 +0200 Lines: 229 Approved: bug-bash@gnu.org Message-ID: References: <16736.1573257142@jinx.noi.kre.to> <14879.1573303743@jinx.noi.kre.to> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Trace: usenet.stanford.edu 1573310740 25987 209.51.188.17 (9 Nov 2019 14:45:40 GMT) X-Complaints-To: action@cs.stanford.edu Cc: "bug-bash@gnu.org" To: Robert Elz Envelope-to: bug-bash@gnu.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=O6HajBQLdDaVnn1pxVGd01Fi8xQI/il9YLExyOIn12Q=; b=kB58yOn6bMpfgQtwbzeSnBStiLiBl4+OV2V1bt4y0r1ewP3CYOBbk/pOz3o5VE0aOJ TFEvCdKCjECO4LElzKjS9GuzE79U5Zd035dJ1fQ0VyI1Wy2rOhWh8LafjrWkNGBiNtJM mgV3tEriTkl9NhaQ9ynD3+IFkwRC9qSVNtuVY/NOnXENUIEI/xAjVKF9CyjiRS9N7xai Fv0QaIyoOtfLmdMB5EUkqi9XqNyRHmyZcChKYexBVzXie86YbZsthW7czfxFWERXTkc/ NYbsFk99G+zeFrWMsPPi5X4kxNaBnUqW9aEIQxL97KqGfel3PlonsrsRXcvu7GgHvxrT iy4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=O6HajBQLdDaVnn1pxVGd01Fi8xQI/il9YLExyOIn12Q=; b=hzAEc1qiSbKbOGEGzZQUwOyh7rrC6RuAnBBj48greqmc11EjaYSDHD1b9xe7fbc9ls LOrY1lsNEVc3IAHkJd94e3/NGLJxtcwut1WTnAwgVOkpon/2KR2DEw5ts71zSFv4gccy JVQzi+C/t80M5rWfuEaFVrO+Bbgf5YhgHE/dHjDr8Pj4pICtsqRQ74TOzs9ef6pBSLpv BtBhRngefupmiNp88YAxSjgUPzvEWto2Se7+IiAF/8+ILUBB8zDLJGg2wRN8jJi/2lIR GywB1+NhiPnsD2wrEdJoalRLrE5jj/ZoLwMNkej3fIgsFJ1a58NQsnJ44/kF6HLdHROH eNYw== X-Gm-Message-State: APjAAAUfa9EzSS/Mj5cv0qv3PgEUsQZZt2tDhASe+/3dVzL/fjNj+zrX jSlNFZxqncbkI8Xd2svFVKrIRLSIxy1pCENLdrPhBCDnZSs= X-Google-Smtp-Source: APXvYqyJS1RAoamTTv78aBrlg2IzOMOkLUTm/e2F664zIRKaDZUfJtoTFHvdHTm+rdoOg0bcy0ZDRkfk8uzQAIH4Y3o= X-Received: by 2002:a0c:f241:: with SMTP id z1mr15364507qvl.53.1573310732222; Sat, 09 Nov 2019 06:45:32 -0800 (PST) In-Reply-To: <14879.1573303743@jinx.noi.kre.to> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::f33 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <16736.1573257142@jinx.noi.kre.to> <14879.1573303743@jinx.noi.kre.to> Xref: csiph.com gnu.bash.bug:15582 You've already answered it, thank you. I didn't know that [:, [., [=3D were special *sequences*, I guess I overlooked that part. Thanks again for taking time to explain it in detail, I'm grateful 9 Kas=C4=B1m 2019 Cumartesi tarihinde Robert Elz yazd= =C4=B1: > Date: Sat, 9 Nov 2019 07:35:16 +0300 > From: =3D?UTF-8?B?T8SfdXo=3D?=3D > Message-ID: < > CAH7i3Lr68CiVXLR9_HoOgQa7Vd-zyVZ+fck-0K3uQPTNSirU2Q@mail.gmail.com> > > | is correct, as "foo" does not contain a ']' which would be required > | > to match there (quoting the ':' means there is no character class, > | > hence we have instead (the negation of) a char class containing '[' > ':' > | > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and > | > followed by ']' and anything. foo does not match. f]oo would. > | > > | > | where exactly is this documented in the standard? > > I'm not sure which part exactly you're looking for, but char sets in sh > are specified to be the same as in REs, except that ! replaces ^ as the > negation character (that's in XCU 2.13.1). Char sets (bracket expression= s) > in RE's are documented in XBD 9.3.5 wherein it states > > A bracket expression is either a matching list expression or a > non-matching list expression. It consists of one or more > expressions: > ordinary characters, collating elements, collating symbols, > equivalence classes, character classes, or range expressions. > The (']') shall lose its special meaning a= nd > represent itself in a bracket expression if it occurs first in th= e > list > (after an initial ('^'), if any). > > Otherwise, it shall terminate the bracket expression, > > That is, a ']' that occurs anywhere else terminates the bracket expressio= n > except: > > unless it appears in a collating symbol (such as "[.].]") > > (not relevant in the given example) > > or is the ending for a collating symbol, > equivalence class, or character class. > > So the ']' that immediately follows the second ':' would not terminate th= e > bracket expression if it is the ending ']' for a character class > (collating symbols and equiv classes not being relevant to the example). > Of course, that can only happen if there is a character class to end. > > There's also > > The special characters '.', '*', '[', and '\\' > (, , , and , > respectively) shall lose their special meaning within a bracket > expression. > > whereupon if the [": sequence does not start a char class, the '[' there > is simply a literal char inside the bracket expression. > > Similarly if the bracket expression ends at the first ']' (the one > imediately > after the second ':') the following ']' is simply a literal character, as > ']' chars are special only when following a '['. > > So, all that's left to determine is whether the [": sequence can be > considered as beginning a char class. > > In a RE it certainly cannot - quote chars (' and ") are not special in > REs at all, and [": is no different syntatically than [x: which no-one > would treat as being the introduction to a char class. > > This is also, I believe (Chet can confirm, or refute, if he desires) wher= e > bash gets the interpretation that "lower" (including the quotes) is the > name of the char class in [:"lower":] except that it cannot be, as char > class names cannot contain quote characters (which should lead to the > whole sub-expression not being treated as a char class at all, instead > bash treats it, I think, as if it were an unknown but valid class name). > > But when it comes from sh, quote chars are "different" and instead of > just being characters, they instead affect the interpretation of the > characters that are quoted. See XCU 2.2: > > Quoting is used to remove the special meaning of certain characte= rs > or words to the shell. > > Quoting can be used to preserve the literal meaning of the specia= l > characters in the next paragrapyh [...] > > and the following may need to be quoted under certain > circumstances. > That is, these characters may be special depending on conditions > described elsewhere in this volume of POSIX.1-2017: > > * ? [ # ~ =3D % > > to which more chars have been added (as I recall) recently by some > Austin Group correction (which I think includes ! : - and ]), that is > to make it clear, that in sh > > [a'-'z] > > is a bracket expression containing 3 chars 'a' '-' and 'z' (which form > of quoting is used to remove the specialness of the '-' is irrelevant). > and that "[a-z]" isn't a bracket expression at all (neither of which > is true in an RE - though the role of \ in RE's is being altered slightlt= y > so if it had been [a\-z] in a RE things are less clear.) > > The effect of this is that in sh, in an expression like > > [![":lower":]] > > the first ':' is not "special" and hence cannot form part of the > magic opening '[:' sequence for a character class. Hence this > expression contains no character class, and consequently the > ':]' chars are simply a ':' in the bracket expression, and then > the terminating ']' - which leaves the second ']' being just a > literal character. > > > While here (these following parts are not relevant to your question I > believe) > when used in sh > > [[:"lower":]] > > should be treated just the same as > > [[:lower:]] > > for the same reason that > > ["abc"] > > is treated the same as > > [abc] > > That is, quoted characters that are not special are no different > than the same character unquoted. That's universal in sh, quoting > removes special meaning (of lots of things) but where there was none > the quoting changes nothing at all, eg: > > "ls" \-'l' > > is exactly the same as > > ls -l > > and > x=3D"foo" y=3D'' > is identical to > x=3Dfoo y=3D > (though not all empty quoted strings are irrelevant that way). > > There are other issues that are less clear what should happen, if your > example had been > > [![:"lower:"]] > > then we get into very murky water indeed. XBD 9.3.5 says: > > The character sequences "[.", "[=3D", and "[:" ( > followed by a , , or ) shall be speci= al > inside a bracket expression > > [aside: not related to my current point, the "shall be special" is what > enables sh quoting to stop that from happening, since quoting in the shel= l > prevents specialness from happening] > > and are used to delimit collating symbols, equivalence class > expressions, and character class expressions. > > That part (so far) is clear and non-controversial. > > These symbols shall be followed by a valid expression and the > matching terminating sequence ".]", "=3D]", or ":]", as described > in the following items. > > That's the part that is less clear. When a valid expression and the > terminating sequence appear, there is no issue, and all is fine - what > is less clear is what happens when one of those reqirements is not met. > > Some read this as purely a reqirement on the application - what the > script writer is required to do - and when they don't the implementation > (sh or RE library, or whatever) is free to interpret things (which means > the whole pattern) however it likes (often as not being a pattern at all)= . > > Personally I disagree - I believe it is a requirement on the application > if it desires the relevant sequence to be interpreted as a char class (et= c) > and if the application does not include a valid expression or terminating > sequence the implementation should be required to treat the opening > char sequence as if it did not begin a char class (etc) and the [: were > simply 2 chars contained in the bracket expression (they must be in > a bracket expression or the issue doesn't arise at all). > > Unfortunately (for the world in general, in that more and more of this > is becoming unspecified, which makes it harder and harder to know what > any particular sequence of characters will do) it seems like the former > interpretation is the more likely to be adopted. > > If I have not understoood the "this" in your > > where exactly is this documented > > please be more precise, and I will try to answer. > > kre > >