Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #14312

Re: Word boundary anchors \< and \> not parsed correctly on the right side of =~

Path csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From Chet Ramey <chet.ramey@case.edu>
Newsgroups gnu.bash.bug
Subject Re: Word boundary anchors \< and \> not parsed correctly on the right side of =~
Date Tue, 10 Jul 2018 10:37:42 -0400
Lines 68
Approved bug-bash@gnu.org
Message-ID <mailman.3374.1531233460.1292.bug-bash@gnu.org> (permalink)
References <5b440fe8.1c69fb81.948f6.4d1e@mx.google.com>
Reply-To chet.ramey@case.edu
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type text/plain; charset=utf-8
Content-Transfer-Encoding 8bit
X-Trace usenet.stanford.edu 1531233461 32653 208.118.235.17 (10 Jul 2018 14:37:41 GMT)
X-Complaints-To action@cs.stanford.edu
Cc chet.ramey@case.edu
To marcelpaulo@gmail.com, bug-bash@gnu.org, bash@packages.debian.org
Envelope-to bug-bash@gnu.org
Openpgp preference=signencrypt
Autocrypt addr=chet.ramey@case.edu; prefer-encrypt=mutual; keydata= xsDiBEEOsGwRBACFa0A1oa71HSZLWxAx0svXzhOZNQZOzqHmSuGOG92jIpQpr8DpvgRh40Yp AwdcXb8QG1J5yGAKeevNE1zCFaA725vGSdHUyypHouV0xoWwukYO6qlyyX+2BZU+okBUqoWQ koWxiYaCSfzB2Ln7pmdys1fJhcgBKf3VjWCjd2XJTwCgoFJOwyBFJdugjfwjSoRSwDOIMf0D /iQKqlWhIO1LGpMrGX0il0/x4zj0NAcSwAk7LaPZbN4UPjn5pqGEHBlf1+xDDQCkAoZ/VqES GZragl4VqJfxBr29Ag0UDvNbUbXoxQsARdero1M8GiAIRc50hj7HXFoERwenbNDJL86GPLAQ OTGOCa4W2o29nFfFjQrsrrYHzVtyA/9oyKvTeEMJ7NA3VJdWcmn7gOu0FxEmSNhSoV1T4vP2 1Wf7f5niCCRKQLNyUy0wEApQi4tSysdz+AbgAc0b/bHYVzIf2uO2lIEZQNNt+3g2bmXgloWm W5fsm/di50Gm1l1Na63d3RZ00SeFQos6WEwLUHEB0yp6KXluXLLIZitEJM0gQ2hldCBSYW1l eSA8Y2hldC5yYW1leUBjYXNlLmVkdT7CYQQTEQIAIQIbAwYLCQgHAwIDFQIDAxYCAQIeAQIX gAUCRX3FIgIZAQAKCRC7WGnwZOp0q069AKCNDRn+zzN/AHbaynls/Lvq1kH/RQCgkLvF8bDs maUHSxSIPqzlGuKWDxbOwE0EQQ6wbxAEAJCukwDigRDPhAuI+lf+6P64lWanIFOXIndqhvU1 3cDbQ/Wt5LwPzm2QTvd7F+fcHOgZ8KOFScbDpjJaRqwIybMTcIN0B2pBLX/C10W1aY+cUrXZ gXUGVISEMmpaP9v02auToo7XXVEHC+XLO9IU7/xaU98FL69l6/K4xeNSBRM/AAMHA/wNAmRB pcyK0+VggZ5esQaIP/LyolAm2qwcmrd3dZi+g24s7yjV0EUwvRP7xHRDQFgkAo6++QbuecU/ J90lxrVnQwucZmfz9zgWDkT/MpfB/CNRSKLFjhYq2yHmHWT6vEjw9Ry/hF6Pc0oh1a62USdf aKAiim0nVxxQmPmiRvtCmcJJBBgRAgAJBQJBDrBvAhsMAAoJELtYafBk6nSr43AAn2ZZFQg8 Gs/zUzvXMt7evaFqVTzcAJ0cHtKpP1i/4H4R9+OsYeQdxxWxTQ==
User-Agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.9.0
In-Reply-To <5b440fe8.1c69fb81.948f6.4d1e@mx.google.com>
Content-Language en-US
X-Junkmail-Status score=7/90, host=mpv2-2015.case.edu
X-Junkmail-PrAS-Raw score=7/90, refid=2.7.2:2018.7.10.135716:17:7.944, ip=, rules=__HAS_REPLYTO, __HAS_CC_HDR, __SUBJ_REPLY, __BOUNCE_CHALLENGE_SUBJ, __BOUNCE_NDR_SUBJ_EXEMPT, __TO_MALFORMED_2, __TO_NO_NAME, __REFERENCES, __HAS_FROM, FROM_EDU_TLD, __HAS_MSGID, __SANE_MSGID, DATE_TZ_NA, __USER_AGENT, __MOZILLA_USER_AGENT, __MIME_VERSION, __IN_REP_TO, __CT, __CT_TEXT_PLAIN, __CTE, __REPLYTO_SAMEAS_FROM_ADDY, __REPLYTO_SAMEAS_FROM_ACC, __FROM_DOMAIN_IN_ANY_CC1, __FROM_DOMAIN_IN_ANY_CC2, __REPLYTO_SAMEAS_FROM_DOMAIN, __ANY_URI, __URI_WITH_PATH, __FRAUD_BODY_WEBMAIL, __URI_NO_WWW, __FRAUD_MONEY_CURRENCY_EURO, __HIGHBITS, __CP_URI_IN_BODY, __FRAUD_MONEY_CURRENCY_DOLLAR, __SUBJ_ALPHA_NEGATE, __URI_IN_BODY, __URI_NOT_IMG, __FORWARDED_MSG, BODYTEXTP_SIZE_3000_LESS, BODY_SIZE_2000_2999, __MIME_TEXT_P1, __MIME_TEXT_ONLY, __URI_NS, HTML_00_01, HTML_00_10, __FRAUD_MONEY_CURRENCY, BODY_SIZE_5000_LESS, __FRAUD_WEBMAIL, IN_REP_TO, MSG_THREAD, [TRUNCATED], so=2010-03-03 19:42:08, dmn=2016-08-03-0138
X-detected-operating-system by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy]
X-Received-From 129.22.103.227
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.21
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <http://lists.gnu.org/archive/html/bug-bash/>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
Xref csiph.com gnu.bash.bug:14312

Show key headers only | View raw


On 7/9/18 9:46 PM, marcelpaulo@gmail.com wrote:

> Bash Version: 4.4
> Patch Level: 19
> Release Status: release
> 
> Description:
> Word boundary anchors \< and \> are not parsed correctly on the right side of a =~ regex match expression. 

Bash assumes Posix regular expressions (EREs), and those don't have any
kind of boundary anchors.

> This evaluates as false:
> 
>     [[ 'foo bar' =~ \<foo\> ]]

As it should; in a Posix RE, you're matching a literal <foo>.

> From the bash reference manual:
> 
>     An additional binary operator, ‘=~’, is available, with the same precedence
>     as ‘==’ and ‘!=’. When it is used, the string to the right of the operator
>     is consid- ered an extended regular expression and matched accordingly (as
>     in regex 3)).

A Posix extended regular expression.

> 
> Reading regex(3), I presumed the regexes would be parsed as C strings, so the
> backslashes would need to be escaped:
> 
>     [[ 'foo bar' =~ \\<foo\\> ]]
> 
> but this results in:
> 
>     bash: syntax error in conditional expression: unexpected token `<'
>     bash: syntax error near `\\<f'

The `<' is not a character that's special to Posix EREs, so it retains its
usual meaning as an operator and needs to be quoted.

>     
> If the regex is stored in a variable, the expression evaluates as true:
> 
>     re='\<foo\>'
>     [[ 'foo bar' =~ $re ]]

Yes, this is the usual idiom to pass special characters that may include
quotes or operators to the regular expression engine.

Bash supports the notion that you can quote characters that are special to
EREs to match them literally. To do this, it has to know which characters
are special to EREs, and it uses POSIX to determine those characters. A
system can add extensions to the standard POSIX interface, but bash can't
assume the presence of any such extension.

> 
> Treating the regex as C strings works for the \b anchor, so that this evaluates as true:

Because `b' is not an operator and doesn't need to be escaped.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread


Thread

Re: Word boundary anchors \< and \> not parsed correctly on the right side of =~ Chet Ramey <chet.ramey@case.edu> - 2018-07-10 10:37 -0400

csiph-web